Post

LLM Foundations Part 2: The Art of Prompt Engineering

LLM Foundations Part 2: The Art of Prompt Engineering

Introduction

In Part 1 of our series, we explored the architecture and training of modern Large Language Models (LLMs). We learned that these models are trained to do one simple thing: predict the next word. Yet, from this simple objective, incredible abilities emerge.

But how do we harness these abilities? How do we steer a model that understands nearly the entire internet towards a specific, desired output? The answer lies in Prompt Engineering: the art and science of designing effective inputs (prompts) to guide an LLM.

If the LLM is a brilliant, improvisational actor, the prompt engineer is the director, providing the script, context, and motivation to get the perfect performance. The code you’ve worked with, such as the LLM class with its ask method that carefully formats messages, is a practical implementation of this director’s role.

This guide will cover the essential techniques of this new and critical skill:

  1. The core prompting techniques: Zero-Shot and Few-Shot prompting.
  2. An advanced technique for reasoning: Chain-of-Thought.
  3. Actionable best practices for crafting effective prompts.

1. The Anatomy of a Prompt

While a prompt can be as simple as a single question, a well-structured prompt often contains several key components:

  • Instruction: The specific task you want the model to perform (e.g., “Summarize the following text.”).
  • Context: External information the model needs to complete the task (e.g., the text to be summarized).
  • Input Data: The specific item or question to act upon.
  • Output Indicator: The desired format for the output (e.g., “Output the summary as a JSON object with the key ‘summary’.”).

2. Core Prompting Techniques

a. Zero-Shot Prompting

This is the most basic form of prompting. You ask the model to perform a task directly, without providing any prior examples of how to do it. It relies entirely on the model’s vast pre-trained knowledge.

Example:

1
2
3
4
Classify the sentiment of the following review as positive, negative, or neutral.

Review: "I was pleasantly surprised by the battery life of this laptop."
Sentiment:

A powerful LLM can easily handle this because it has seen countless examples of sentiment classification during its pre-training.

b. Few-Shot Prompting

In few-shot prompting, you provide a few examples (or “shots”) of the task within the prompt itself. This is a form of in-context learning, where you show the model exactly what you want, guiding its output format and style.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
Classify the sentiment of the following reviews as positive, negative, or neutral.

Review: "The movie was a bit too long for my taste."
Sentiment: neutral

Review: "I absolutely loved the acting and the storyline!"
Sentiment: positive

Review: "The service was slow and the food was cold."
Sentiment: negative

Review: "I was pleasantly surprised by the battery life of this laptop."
Sentiment:

By providing examples, you’ve made the task clearer and constrained the model to output one of the three desired labels. The JSON data for Behavioral Cloning you’ve seen, with its {"from": "human", "value": ...} and {"from": "gpt", "value": ...} pairs, is a perfect example of a dataset designed for large-scale, few-shot learning (fine-tuning).

3. Advanced Prompting: Chain-of-Thought (CoT)

One of the most significant breakthroughs in prompt engineering is Chain-of-Thought (CoT) prompting. It dramatically improves an LLM’s performance on tasks that require logical reasoning, such as math word problems or logic puzzles.

The core idea is to prompt the model not just for the final answer, but to “think step-by-step” and lay out its reasoning process. This mimics how humans solve complex problems—by breaking them down.

Example: Without CoT

Prompt:

1
2
Q: John has 5 apples. He gives 2 to his friend and then buys 3 more. How many apples does he have?
A:

Model Output: 6 (Incorrect)

Example: With Few-Shot CoT

Prompt:

1
2
3
4
5
Q: A juggler has 10 balls. He drops 3, then picks up 2. How many balls is he juggling?
A: The juggler starts with 10 balls. He drops 3, so he has 10 - 3 = 7 balls. He picks up 2, so he now has 7 + 2 = 9 balls. The answer is 9.

Q: John has 5 apples. He gives 2 to his friend and then buys 3 more. How many apples does he have?
A:

Model Output: John starts with 5 apples. He gives 2 away, so he has 5 - 2 = 3 apples. He then buys 3 more, so he has 3 + 3 = 6 apples. The answer is 6. (Correct reasoning and answer)

You can even achieve this with Zero-Shot CoT by simply appending the magic phrase: "Let's think step by step." to your prompt.

4. Best Practices for Crafting Effective Prompts

  1. Be Specific and Clear: Avoid ambiguity. Instead of “Write about dogs,” use “Write a 300-word blog post about the benefits of daily walks for golden retrievers.”
  2. Use Personas: Assigning a role to the model focuses its knowledge. For example: "Act as a senior software architect. Review the following code for potential scalability issues."
  3. Use Delimiters: Use characters like ###, """, or XML tags (<context>, </context>) to clearly separate instructions from context and input data. This helps the model distinguish between different parts of your prompt.
  4. Specify the Output Format: Don’t leave the output structure to chance. Explicitly ask for the format you need. Example: "Extract the key people and locations from the text below. Provide the output as a JSON object with two keys: 'people' and 'locations'."
  5. Provide Context: Don’t assume the model has all the necessary information. If you’re asking about a specific document or conversation, include it in the prompt. (This is the core idea behind RAG, which we’ll cover in Part 4).
  6. Iterate, Iterate, Iterate: Your first prompt is rarely your best. Prompt engineering is an iterative process of testing, analyzing the output, and refining the prompt to get closer to your desired result.

5. Prompts in Practice: A Look at the Code

The LLM class you’ve worked with, which includes methods like ask and ask_with_images, provides a perfect window into how these concepts are implemented.

The format_messages method is particularly insightful. It takes a list of messages and converts them into the structured format required by models like GPT-4. This format uses a list of dictionaries, each with a role and content:

  • {"role": "system", "content": "You are a helpful assistant."}: This sets the persona and high-level instructions.
  • {"role": "user", "content": "What is the capital of France?"}: This is the user’s direct input.
  • {"role": "assistant", "content": "The capital of France is Paris."}: This is a previous response from the model, used to provide conversational history (a form of few-shot learning).

This structured approach is far more robust than a single string prompt, as it clearly delineates who said what, allowing the model to better understand the flow of a conversation. The ask method then takes this formatted list and sends it to the OpenAI API, completing the cycle from prompt design to model execution.

Conclusion

Prompt engineering is the bridge between human intent and machine execution. It’s a skill that blends creativity, logic, and empirical testing. By mastering the techniques of zero-shot, few-shot, and chain-of-thought prompting, and by following clear best practices, you can unlock the full potential of these powerful language models.

In Part 3 of our series, we will explore the ecosystem of tools—like Hugging Face, LangChain, and vector databases—that help us manage and scale these prompting techniques into full-fledged applications.

Suggested Reading

  • OpenAI’s Prompt Engineering Guide: An excellent and practical guide from the creators of GPT.
  • “Prompt Engineering Guide” by DAIR.AI: A comprehensive, open-source guide covering a wide range of advanced techniques.
  • Lil’Log’s Blog Post on Prompting: A great overview of prompting research and techniques.
This post is licensed under CC BY 4.0 by the author.