Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

Published Oct 29, 2023

Unleash the Power of Generative AI for Comprehensive Document Analysis and Summarization

Text summarization is a standard Natural Language Processing (NLP) task that creates a concise summary of a longer text. According to LangChain , with foundation models, if the text is too long to stuff into a single prompt, we could use techniques like Map Reduce to summarize each document in a â€˜mapâ€™ step and then â€˜reduceâ€™ the summaries into a final summary. This technique is commonly referred to as a â€˜summary of summaries.â€™ In this post, we will move beyond simple summarization and explore advanced techniques to analyze long texts using Amazon Bedrock and the Anthropic Claude 2 Foundation Model.

In the spirit of season ðŸŽƒ, we will analyze Bram Stokerâ€™s original 1897 Victorian-era graphic novel, Dracula, freely available through Project Gutenberg . Written in the epistolary style, the novel is based on diary entries, telegrams, letters, and memos, providing an additional level of complexity when analyzing the longer text. We will explore elements such as the novelâ€™s context, setting, characters, plot, literary devices, and themes aided by Generative AI.

Source Code

The source code used in this postâ€™s demonstration is open-sourced and available on GitHub . I suggest starting with the projectâ€™s Jupyter Notebook , which contains all the examples herein plus additional examples not included in the post.

Amazon Bedrock and Anthropic Claude

AWS announced the general availability of Amazon Bedrock in late September 2023. Amazon Bedrock is a fully managed serverless service that makes foundation models (FMs) from leading AI companies, such as Cohere, Anthropic, Stability AI, Meta, and AI21 Labs, available through an application programming interface (API). The Anthropic Claude 2 foundation model employed in this post was accessed through Amazon Bedrock using the Boto3 SDK for Python.

What is a Prompt?

Quoting Wikipedia , Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform. Further, citing Prompt Engineering Guideâ€™s Elements of a Prompt , a prompt can contain any of these four types of elements:

Instruction â€” a specific task or instruction you want the model to perform
Context â€” external information or additional context that can steer the model to better responses
Input Data â€” the input or question that we are interested in finding a response for
Output Indicator â€” the type or format of the output

You do not need all four elements for a prompt, and the format depends on the task and the model provider.

Model-specific Prompt Engineering

Generally, each model provider has some required or recommended methods for constructing an optimal prompt for their particular foundation model. For example, according to Anthropicâ€™s guide, Using the correct format , Claude was trained as a conversational agent using these special tokens to mark who is speaking. The \n\nHuman: (you) asks a question or gives instructions and the \n\nAssistant: (Claude) responds. Additionally, according to Anthropicâ€™s guide, Constructing a prompt , XML tags, like <tag></tag>, help demarcate some essential parts of your prompt, such as rules, examples, or input text to process. Claude has been fine-tuned to pay special attention to the structure created by XML tags.

Text Splitting

To summarize long texts, texts whose length exceeds the context window (aka token limit) of a foundation model must be split into what is commonly referred to as chunks. Even Anthropic Claude 2, with its massive 100K context window, cannot ingest an entire novel, whose 27 chapters total approximately 833K characters, 153K words, or 206K tokens.

Splitting or chunking a long text into smaller chunks is more challenging than it may sound. You could somewhat mindlessly set an arbitrary character chunk size and the chunk overlap. However, as LangChainâ€™s Text splitter documentation states, ideally, you want to keep the semantically related pieces of text together. What semantically related means could depend on the type of text. Luckily enough, our novel contains 27 chapters with an average length of 75 paragraphs, 5,683 words, 30,874 characters, or 7,628 tokens. A chapter is defined as â€œa main division of a book.â€ Thus, each chapter should contain semantically related pieces of text.

We can use simple Regular Expressions to split the chapters and remove any extraneous content before or after the novelâ€™s chapters:

import re

def split_book(book_text):
    # Use regular expressions to split the book into chapters
    # Specific to this Gutenberg eBooks format
    chapters = re.split(r"^CHAPTER [IVXLCDM]+$", book_text, flags=re.MULTILINE)
    
    # remove everything prior to chapter 1
    chapters.pop(0)

    # Split the last chapter into two parts and remove everything after "THE END"
    chapter26 = re.split(r"^.*THE END.*$", chapters[26], flags=re.MULTILINE)[0]
    chapters.pop(26)
    chapters.append(chapter26)

    return chapters

What areÂ Tokens?

Anthropic defines tokens as the smallest individual â€œatomsâ€ of a language model and can varyingly correspond to words, subwords, characters, or even bytes in the case of Unicode. For Claude, the average token is about 3.5 characters. Using Anthropicâ€™s Python library to access the Anthropic REST APIâ€™s client.count_tokens() method, we can compute the average input token length of the chapters in the novel to be 7,628, with an average character-to-token ratio of 4:05, well within the ballpark of the Anthropic estimate of 3.5:1.

from anthropic import Anthropic

client = Anthropic()
chapter_token_count = client.count_tokens(chapter.strip())

Based on that average input token count, we could easily pass the contents of several chapters at one time into the Anthropic Claude 2 foundation model due to its 100K token context window. However, instead of figuring out other ways to split the novel and maintain the semantic relationship within chunks, we will use the individual chapters for all examples in this post.

chpt  paras    words     chrs   tokens    ratio     cost
--------------------------------------------------------
   1     39    5,547   30,624    7,218     4.24    0.080
   2     62    5,305   28,510    6,833     4.17    0.075
   3     46    5,571   29,805    7,075     4.21    0.078
   4     86    5,703   30,267    7,338     4.12    0.081
   5     28    3,390   18,019    4,650     3.88    0.051
   6     64    5,299   29,195    7,524     3.88    0.083
   7     62    5,424   29,964    7,120     4.21    0.078
   8     59    6,044   32,637    7,970     4.09    0.088
   9     64    5,709   30,180    7,477     4.04    0.082
  10    100    5,623   30,817    7,706     4.00    0.085
  11     78    4,754   26,991    7,014     3.85    0.077
  12     95    6,993   37,944    9,372     4.05    0.103
  13    107    6,242   34,198    8,490     4.03    0.094
  14     97    6,053   32,612    8,277     3.94    0.091
  15    101    5,485   29,787    7,509     3.97    0.083
  16     62    4,381   23,928    5,895     4.06    0.065
  17     80    5,264   29,074    7,140     4.07    0.079
  18     84    6,615   35,948    8,924     4.03    0.098
  19     46    5,505   29,462    7,041     4.18    0.078
  20    104    5,467   31,241    7,883     3.96    0.087
  21     69    5,905   32,220    7,940     4.06    0.087
  22     65    5,249   28,130    6,822     4.12    0.075
  23     84    5,403   29,551    7,351     4.02    0.081
  24     75    6,057   32,123    7,924     4.05    0.087
  25     87    5,907   32,612    8,153     4.00    0.090
  26    106    6,818   37,084    9,266     4.00    0.102
  27     78    7,733   40,677   10,055     4.05    0.111

---
Raw book text
---
sum paras:	   2,040
sum chars:	 856,545
sum words:	 658,827
sum tokens:	 211,209


Chapters
---
chpt count:	      27
---
min paras:	      28
max paras:	     107
mean paras:	      75
sum paras:	   2,028
---
min words:	   3,390
max words:	   7,733
mean words:	   5,683
sum words:	 153,446
---
min chars:	  18,019
max chars:	  40,677
mean chars:	  30,874
sum chars:	 833,600
---
min tokens:	   4,650
max tokens:	  10,055
mean tokens:	   7,628
sum tokens:	 205,967
---
min chrs/tkn:	    3.85
max chrs/tkn:	    4.24
mean chrs/tkn:	    4.05
---
min cost:	$  0.051
max cost:	$  0.111
mean cost:	$  0.084
sum cost:	$  2.270

Analysis Techniques

In the following section, we will examine several methods for analyzing the long text of the novel. As we progress through the examples, we will increase the complexity of the prompts, adding techniques to shape the output more precisely.

1. Plot Summarization

A simple summary of summaries is one of the most common examples of text summarization of long texts. In this example, first loop through each chapter and ask the foundation model to write a concise single-paragraph summary of the main points, events, and ideas contained in the chapter.

First, letâ€™s look at a simple prompt that we could pass to the Anthropic Claude model, which should return a summary:

prompt = f"""\n\nHuman: Write a short summary of the following chapter:

    {chapter}.\n\nAssistant:"""

The challenge with this basic prompt is that we donâ€™t know how large of a response we will get back or how it will be formatted. By increasing the amount of instructions we included in the prompt, we can increase the probability that we can get back the exact type of response we expect from the model:

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph summary of the following chapter:
    
    {chapter}\n\nAssistant:"""

In addition to adding even more instructions to the prompt, we can also use XML-style tags, such as <chapter></chapter>, which Anthropic recommends for their Claude models to demarcate sections of the prompt better.

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph summary of the chapter's main points, events, and ideas contained inside the <chapter></chapter> XML tags below.  
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

This prompt is then passed into the following function, which calls the Anthropic Claude 2 foundation model through Amazon Bedrock:

import json
import boto3
from botocore.exceptions import ClientError

def create_bedrock_connection(self):
    client_bedrock = boto3.client(
        "bedrock-runtime", 
        "us-east-1"
    )
    
    return client_bedrock


def create_summary(self, client_bedrock, prompt):
    try:
        body = json.dumps(
            {
                "prompt": prompt,
                "max_tokens_to_sample": self.max_tokens_to_sample,
                "temperature": self.temperature,
                "top_k": self.top_k,
                "top_p": self.top_p,
                "stop_sequences": self.stop_sequences,
            }
        )

        accept = "application/json"
        content_type = "application/json"

        response = client_bedrock.invoke_model(
            body=body, 
            modelId=self.model, 
            accept=accept, 
            contentType=content_type
        )
        response_body = json.loads(response.get("body").read())
        return response_body
    except ClientError as ex:
        self.logger.error(ex)
        exit(1)

Summary of Summaries

Here is an example of the modelâ€™s summary of Chapter 1 of the novel:

â€œJonathan Harker travels by train from Munich to Bistritz, where he stays at the Golden Krone Hotel before departing for the Borgo Pass. He is met by a mysterious driver in a calÃ¨che who takes him through the Pass at night. Harker sees strange sights along the way, including blue flames and a ring of wolves surrounding them. Despite feeling afraid, Harker continues on to a ruined castle, arriving just before midnight. The chapter covers Harkerâ€™s journey to Transylvania and the foreboding events he witnesses that hint at the supernatural horrors that await him at Count Draculaâ€™s castle.â€

Image from Midjourney using the novelâ€™s generated summary

Using a nearly identical prompt, we can then pass the individual chapter summaries, stored in a single text file, to the same function to derive a summary of summaries â€” a summary of the complete novel. Note that we have added directions for the Assistant (Claude) to ensure the expected output further:

prompt = f"""\n\nHuman: Write a concise grammatically correct summary using only the information within the <text></text> XML tags below. 
    The Assistant will refrain from using bullet-point lists.
    
    <text>
    {input_text}
    </text>\n\nAssistant:"""

We should receive a response similar to the following based on a summarization of the individual chapter summaries:

â€œThe novel follows the harrowing journey of Jonathan Harker and his companions to defeat Count Dracula after realizing he is an immortal vampire. It begins with Harkerâ€™s ominous imprisonment in Draculaâ€™s remote Transylvanian castle, where he first encounters the supernatural horror. Back in England, his fiancÃ©e Mina and her friend Lucy become targets of Draculaâ€™s evil, as signs point to his arrival in London. Led by Professor Van Helsing, Harker joins Lucyâ€™s suitors and doctors, John Seward and Van Helsing, to share information and hunt Dracula. After Lucy tragically dies and returns as a vampire, the men heroically destroy her. They race to find Draculaâ€™s lairs across London and sterilize his boxes of native earth to weaken him. Their pursuit of Dracula leads them back to Transylvania, where they finally defeat him through a violent battle at his castle. But not before a bitten Mina faces her own dark transformation, and Quincey Morris sacrifices his life, striking the final blow to Draculaâ€™s heart. United by love and courage, the characters embrace the reality of supernatural evil in the world, finding the strength to conquer it.â€

Image from Midjourney using the description of the novelâ€™s setting in London

We can modify the prompt if we desire a bullet-point list-format summary of each chapter versus the previous paragraph format. Note we are further reinforcing the use of the context contained between the <chapter></chapter> tags in the instructions:

prompt = f"""Human: Write a bullet-point list of 3-4 main points, events, and ideas using only the information within the <text></text> XML tags below.
    
    <text>
    {input_text}
    </text>\n\nAssistant:""

For Chapter 1, the results should look similar to the following:

Here are the main points from the chapter:

Jonathan Harker travels by train from Munich to Bistritz, entering the region of Transylvania. He notes the changing landscape and glimpses the cultures/peoples that inhabit the region.
In Bistritz, the locals seem wary of Harkerâ€™s intended destination and make signs to ward off evil. Harker brushes this off as superstition.
Harker stays at the Golden Krone Hotel. He receives a note from Count Dracula welcoming him and mentioning a carriage ride.
During the carriage ride into the mountains, the driver seems increasingly anxious/eager. Harker sees a blue flame in the darkness that spooks the horses.
The howling of wolves surrounds them. The driver mysteriously disappears and returns.
They arrive at a ruined castle, assumed to be Draculaâ€™s, under the moonlight. The scene is gloomy and ominous.

2. Character Analysis

In addition to a simple plot summary, we can analyze the context, setting, characters, literary devices, and themes aided by Generative AI. Next, we will focus on the main characters from the novel. To reduce the output size, we have asked the model only to return 3â€“4 main characters from each chapter. Note how we have added two additional sections, the <template> and <example> sections. These two sections will help reinforce the format of the modelâ€™s response. In the template, we use the square brackets to indicate placeholders the model will replace (e.g., [Character]: [Description]). Again, we will provide specific instructions to the Assistant to ensure the output is as expected.

prompt = f"""\n\nHuman: Provide a list of the 3-4 main characters and a brief description of each main character, in the chapter contained in the <chapter> tags below.
    The Assistant will order the main characters by how many times they are mentioned.
    The Assistant will number the characters.
    
    Follow the template contained in the <template> tags below and replace the placeholders with the relevant information:
    <template>
    [Number]. [Character]: [Description]
    </template>
    
    Here is an example contained in the <example> tags below:
    <example>
    1. Pink Panther: A suave and smooth-talking anthropomorphic animated panther.
    </example>
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back character descriptions similar to the following for each chapter:

Here are the main characters from the chapter:

Jonathan Harker: A young English lawyer traveling to Transylvania to help a nobleman with a legal matter. He keeps a journal describing his journey.
Count Dracula: A mysterious Transylvanian nobleman whom Jonathan is traveling to meet. He sends cryptic letters arranging Jonathanâ€™s travel.
The Coach Driver: Drives Jonathan through the Borgo Pass to reach Draculaâ€™s castle. He seems uneasy about taking Jonathan there.
Wolves: A pack of wolves surrounds the coach as it travels through the Borgo Pass. Their howling terrifies the horses and Jonathan.

Image from Midjourney using the description of the novelâ€™s protagonist, Jonathan Harker

Using a nearly identical prompting strategy to the previous summary of summaries, we then pass the individual character descriptions to the same function as before to derive an overall character description for Count Dracula based on the individual chapter-level observations:

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph description of the main character, Dracula (aka Count Dracula), based on the following individual character descriptions. 
    The Assistant will refrain from using bullet-point lists.

    <summaries>
    {character_summaries}
    </summaries>\n\nAssistant:"""

We should receive a response similar to the following overall character description based on the individual chapter-level descriptions of Count Dracula:

â€œHere is a single-paragraph description of Count Dracula:

Count Dracula is an ancient vampire of nobility from the remote mountains of Transylvania who resides alone in a decaying yet magnificent castle. He initially appears charming and welcoming but gradually reveals his true sinister and threatening nature. With his gaunt figure, waxen skin, high aquiline nose, pointed ears, sharp teeth, and burning red eyes, Dracula exhibits a mysterious magnetism and power over humans, able to control animals and the weather. He attacks the innocent Jonathan Harker, later turning Lucy Westenra into a vampire. Persistent with his vampiric evil, Dracula possesses supernatural abilities and cunning intelligence, escaping capture multiple times. Though eloquent and refined on the surface, at his core, Dracula is a ruthless predator who spreads his undead curse, requiring the combined efforts of Dr. Van Helsing and his allies to defeat him finally.â€

Image from Midjourney using the description of the novelâ€™s antagonist, Count Dracula

3. Character Types

To quote Essentials in Writing , without characters, there would be no plot or conflicts. Readers should take note of the authorâ€™s methods of character development, the charactersâ€™ traits and actions, and how the characters change (or donâ€™t change) throughout the story. There are several typical character types used throughout fictional literature, including the Protagonist, Antihero, Antagonist, Guide, Contagonist, Sidekicks (Deuteragonist), Henchmen, Love Interest, Temptress, Confidant, and Foil.

Recommended by LinkedIn

Tensor<T> in .NET9

David Shergilashvili 1 month ago

Vector Search in AI and Its Advantages Over LLMs andâ€¦

Jean KOÃVOGUI 6 months ago

PyTorch Moves to Linux Foundation, Chinese Corpus forâ€¦

Lightning AI 2 years ago

Here is an example prompt to extract those character types from each chapter. Note how we use the <character_types> section to define a list of character types to be considered.

prompt = f"""\n\nHuman: The following list of character types, contained in the <character_types> tags below, are often found in fictional literature: 
    <character_types>
    - Protagonist
    - Antihero
    - Antagonist
    - Guide
    - Contagonist
    - Sidekicks (Deuteragonist)
    - Henchmen
    - Love Interest
    - Temptress
    - Confidant
    - Foil
    </character_types>

    Based on this list of character types, give 3-4 examples of character types found in the chapter in the <chapter> tags below, including the character name and an explanation of why.
    The Assistant will use a bullet-point list.
    The Assistant will not use square brackets in its response.

    Follow the template contained in the <template> tags below and replace the placeholders with the character name, character type, and explanation:
    <template>
    - [Character_Name] - [Character_Type]: [Explanation]
    </template>

    Here is an example contained in the <example> tags below:
    <example>
    - Love Interest - Minnie Mouse: Mickey Mouse's lifelong romantic interest.
    </example>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back a list of character types similar to the following for each chapter:

Here are some examples of character types from the chapter:

Protagonist â€” Jonathan Harker: The main character and narrator who is devoted to protecting his wife, Mina.
Love Interest â€” Mina Harker: Jonathanâ€™s wife, whom he loves deeply and wants to protect from Dracula.
Antagonist â€” Count Dracula: The vampire whom Jonathan and the others are trying to defeat.
Henchman â€” Draculaâ€™s â€˜earth-boxesâ€™: The boxes of earth Dracula uses as hideouts that Jonathan and the others must find and destroy.
Guide/Confidant â€” Professor Van Helsing: The experienced vampire hunter guides the others in their quest against Dracula.

4. Literary Devices

According to Literary-Devices.com , literary devices are techniques and structures writers use to convey a message, create an effect, and evoke emotions in the reader. These devices can emphasize a literary work, make it more interesting, or convey a deeper meaning. Standard literary devices include Metaphor, Simile, Alliteration, Personification, and Symbolism.

Image from Midjourney using the description of Chapter 7 where the schooner Demeter comes ashore

We will use the following prompt to extract literary devices from the chapters and generate explanations for each device. We have provided a list of literary devices to the model in the <literary_devices> tags. Again, we have included additional instructions to the Assistant to ensure the output is as expected.

prompt = f"""\n\nHuman: The following list of literary devices inside the <literary_devices></literary_devices> XML tags below, are often found in fictional literature: 
    <literary_devices>
    Allegory, Alliteration, Allusion, Amplification, Anagram, 
    Analogy, Anthropomorphism, Antithesis, Chiasmus, Colloquialism, 
    Circumlocution, Epigraph, Euphemism, Foreshadowing, Hyperbole, 
    Imagery, Metaphor, Mood, Motif, Onomatopoeia, Oxymoron, Paradox, 
    Personification, Portmanteau, Puns, Satire, Simile, Symbolism, Tone
    </literary_devices>

    Based on the list of literary devices, give 2-3 examples of literary devices found inside the <chapter></chapter> XML tags below, and explain why.
    The Assistant will use a bullet-point list.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below for your response. Replace the placeholders, in square brackets, with the literary device and the explanation:
    <template>
    - [Literary_Device]: [Explanation]
    </template>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function again, we should get back a list of literary devices similar to the following for each chapter. Note how the model has incorporated quotes from the chapter into its explanation:

Here are some examples of literary devices found in the provided chapter:

Foreshadowing: When Jonathan describes seeing Dracula crawling down the castle wall like a lizard, this foreshadows Draculaâ€™s inhuman, monstrous nature that will be further revealed later.
Imagery: Stoker uses vivid imagery to describe the setting, like â€œgreat jagged mountain fastnesses, rising peak on peak, the sheer rock studded with mountain ash and thorn.â€ This helps the reader visualize the remote, ominous location.
Symbolism: The crucifix Jonathan places above his bed symbolizes the protection of his Christian faith against the evil Dracula.
Metaphor: Draculaâ€™s eyes are described as â€œblazing red with passion,â€ like the flames of hell, metaphorically equating Dracula with the devil.
Personification: The creeping mist is personified as having a â€œdeadly chillâ€ that creeps into Jonathanâ€™s bones.

Image from Midjourney based on the foreshadowing literary device

5. Settings

According to Britannica , setting, in literature, is the location and time frame in which the action of a narrative takes place. The makeup and behavior of fictional characters often depend on their environment just as much as on their personal characteristics.

We will use the following prompt to generate a numbered list of no more than three settings from each chapter, in order according to how often they are mentioned in the chapter. Again, we use the <template> and <example> tags, along with specific instructions to the Assistant to ensure the output is as expected.

prompt = f"""\n\nHuman: Provide a list of the no more than three settings and a brief description of each setting inside the <chapter></chapter> XML tags below.
    The Assistant will order the settings by how many times they are mentioned in the chapter.
    The Assistant will number the list of settings.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    [Number]. [Setting]: [Description]
    </template>

    Here is an example inside the <example></example> XML tags below:
    <example>
    1. Hoboken, New Jersey: Part of the New York metropolitan area on the banks of the Hudson River across from lower Manhattan, where the story takes place.
    </example>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function again, we should get back a numbered list of settings similar to the following for each chapter:

Here are the top 3 settings mentioned in the chapter, with brief descriptions:

Dr. Sewardâ€™s house/asylum: This is where the majority of the chapter takes place, with scenes in Dr. Sewardâ€™s study, Renfieldâ€™s room, etc. It is an asylum where Dr. Seward cares for psychiatric patients, including Renfield.
Carfax: The old house purchased by Dracula in Purfleet. Some of the boxes of earth were delivered here from Draculaâ€™s castle. The men plan to investigate Carfax further.
Draculaâ€™s castle: Draculaâ€™s residence in Transylvania. It is where he originally lived before purchasing properties in England. Van Helsing recounts Draculaâ€™s history and roots in the castle and the area around it.

Image from Midjourney based on a description of Dr. Sewardâ€™s asylum

6. Questions andÂ Answers

In addition to analysis, we can leverage the large language model to produce questions and answers based on the content of each chapter. The questions could be used to develop training and evaluation material or converted to dense vector embeddings and used for semantic similarity searches. Note the requested question-and-answer format of the <template> section in this prompt. This is reinforced by providing an example in the <example> section. Lastly, we have added even more instructions to the â€œAssistantâ€ to ensure the output is as expected.

prompt = f"""\n\nHuman: Generate a list of 6 questions that the following chapter, inside the <chapter></chapter> XML tags below, could be used to answer. 
    The Assistant will provide both the question and the answer.
    The Assistant will refrain from asking overly broad questions.
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    Q: [Question]
    A: [Answer]
    </template>

    Here is an example inside the <example></example> XML tags below:
    <example>
    Q: What is the weather like in Spain?
    A: The rain in Spain stays mainly in the plain.
    </example>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back a list of questions and answers similar to the following for each chapter:

Here are three questions that could be answered by the provided chapter:

Q: What happened when Jonathan Harker woke up back in his room after his encounter with Dracula?

A: He was confused about how he got back to his room, indicating Dracula must have carried him there. His clothes were folded neatly, which was unlike him, and his watch was unwound, suggesting Dracula hurriedly undressed him.

Q: How does Jonathan plan to try to escape from Castle Dracula?

A: He decides his only chance is to climb down the castle wall, sneak into Draculaâ€™s room through the window, and search for the key that will let him out of the castle.

Q: What does Jonathan discover when he finds Draculaâ€™s body in a box in the underground crypt?

A: He finds Dracula appearing partially rejuvenated â€” with dark hair instead of white, and a flushed, blood-stained face. This horrifies Jonathan, who sees Dracula as a monster. In a rage, Jonathan tries to kill Dracula but fails before fleeing the crypt.

By modifying the question-and-answer format prompt above, we can generate multiple-choice questions. Note the increased complexity of the <template> section in this prompt. We can indicate precisely how many choices we want to generate.

prompt = f"""\n\nHuman: Generate a list of 6 multiple-choice questions that the following chapter, inside the <chapter></chapter> XML tags below, could be used to answer. 
    The Assistant will provide the question, four possible answers lettered a,b,c, and d, and the correct answer.
    The Assistant will ask brief, specific questions.
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    Q: [Question]
    (a) [Choice_1]
    (b) [Choice_2]
    (c) [Choice_3]
    (d) [Choice_4]
    A: (Letter) [Correct_Answer]
    </template>
    
    Here is an example inside the <example></example> XML tags below:
    <example>
    Q: What color is fresh grass?
    (a) Red
    (b) Blue
    (c) Green
    (d) Yellow
    A: (c) Green
    </example>
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our modified prompt to the same function as before, we should get back multiple-choice questions similar to the following for each chapter:

Here are three multiple-choice questions based on the chapter:

Q: What was the name of the hotel where Jonathan Harker stayed in Bistritz? (a) The Golden Lion

(b) The Silver Stag

(d) The Red Bear

A: (c) The Golden Krone

Q: What did Jonathan order for dinner at the hotel that was described as a national dish?Â

(a) Goulash

(b) Schnitzel

(d) Sauerbraten

A: (c) Paprika Hendl

Q: What did the old lady at the hotel give Jonathan as a charm against evil before he departed?

(a) A four-leaf clover

(b) A crucifix

(d) An amulet

A: (b) A crucifix

References

Anthropic: Prompt engineering for Claudeâ€™s long context window
Anthropic: Introduction to prompt design
Anthropic: Advanced Text Analysis
DAIR: Prompt Engineering Guide

Conclusion

In this post, we learned how to move beyond simple long text summarization and discovered how to analyze different aspects of the document. Further, we build progressively refined prompts, which help precisely shape the output from the Anthropic Claude 2 foundation model, accessed through Amazon Bedrock. ðŸ‘»

This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

Digital Marketing

Digital Marketing Executive at Oxygenite

3mo

"Great insights on advanced text summarization! Exploring Bram Stokerâ€™s Dracula using AWS and Anthropic Claude 2 sounds fascinating. #NLP #GenerativeAI #TextSummarization"

Shanthosh Kumaar

ML engineer @ SurveySparrow || M.Sc Data Science

6mo

Great content Gary Stafford ðŸ‘ related to summarization with different size of input data. I need to develop a real time text summarization where data will be updating regularly in timely basis. How can I implement this effectively with the help of as bedrock.

1 Reaction

Adam Chen Longhui

Quant Trading Enthusiast, MSc in Quant Finance

9mo

Hi Mr stafford, thank you for sharing the wonderful article. May I know what are some ways for individuals to get enough dataset to train a text-summarization model to a real-world deployable level?

1 Reaction

Alex Panait

AI Strategy | Enterprise Prompt Engineering | Explainable AI | Financial Services

10mo

Happy to discuss bringing RikAI multi modal foundation models on Bedrock! The summarization problem gets a lot easier and summaries become a lot more accurate as our models have a minimum of 1bn tokens context windows. A few illustrations here: https://www.loom.com/share/5af05d0b852c458f8dbf418d5bbef435?sid=2bf83a92-2323-4f2e-b5a2-dc3605abf5fe

1 Reaction

Elliott A.

Senior System Reliability Engineer / Platform Engineer

12mo

Amazon Web Services (AWS) has it all.

1 Reaction

See more comments

To view or add a comment, sign in

See all

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedInâ€™s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Directory

Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

Unleash the Power of Generative AI for Comprehensive Document Analysis and Summarization

Source Code

Amazon Bedrock and Anthropic Claude

What is a Prompt?

Model-specific Prompt Engineering

Text Splitting

What areÂ Tokens?

Analysis Techniques

1. Plot Summarization

Summary of Summaries

2. Character Analysis

3. Character Types

Recommended by LinkedIn

4. Literary Devices

5. Settings

6. Questions andÂ Answers

References

Conclusion

More articles by this author

Sign in

Insights from the community

Others also viewed

Fuzzy Wuzzy Matching

Demystifying Tokenization: Preparing Data for Large Language Models (LLMs)

Applied Machine Learning: Naive Bayes, Linear SVM, Logistic Regression, and Random Forest

Understanding Transformers: A Deep Dive with PyTorch

Data Preparation for Fine-Tuning LLMs (Large Language Models) using Google Colab

OpenSearch with AI

Can We Really Hand-Engineer Level 2+ AGI?

Fine-Tuning Made Easy: The Game-Changing Benefits of LoRA for Language Models

Evolution of Word Embeddings: A Journey Through NLP History

Explore topics

Directory

Unleash the Power of Generative AI for Comprehensive Document Analysis and Summarization

Source Code

Amazon Bedrock and Anthropic Claude

What is a Prompt?

Model-specific Prompt Engineering

Text Splitting

What areÂ Tokens?

Analysis Techniques

1. Plot Summarization

Summary of Summaries

2. Character Analysis

3. Character Types

Recommended by LinkedIn

4. Literary Devices

5. Settings

6. Questions andÂ Answers

References

Conclusion

Comparing Nine Leading Text-to-Image Generation Models for Adding Text toÂ Images

Nov 12, 2024

Quantitative and Qualitative Image Analysis Using Nine Different Multimodal Generative AI Vision Models

Oct 24, 2024

Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and Translation

Oct 8, 2024

Local Inference with Metaâ€™s Latest Llama 3.2 LLMs Using Ollama, LangChain, and Streamlit

Sep 27, 2024

AI-Powered Product Perfection â€” Part 2 of 2: Leveraging Generative AI Techniques for Diverse, High-Fidelity Product Shot Variations

Sep 3, 2024

AI-Powered Product Perfection - Part 1 ofÂ 2: Leveraging Generative AI Techniques for Diverse, High-Fidelity Product Shot Variations

Aug 28, 2024

Automating Fine-tuning Dataset Creation using Multimodal Generative AI Models

Jul 31, 2024

3D2I: Create Stunning AI-generated Images and Product Concepts from 3D Models on AWS with AmazonÂ Bedrock

Jul 1, 2024

Generative AI Videos with Stability AIâ€™s Stable Video Diffusion XT using Asynchronous Inference on Amazon SageMaker

Apr 23, 2024

Multimodal Advertising Analysis and Creative Content Generation using Anthropic Claude 3 on Amazon Bedrock

Apr 15, 2024

Sign in

Insights from the community

Others also viewed

Fuzzy Wuzzy Matching

Demystifying Tokenization: Preparing Data for Large Language Models (LLMs)

Applied Machine Learning: Naive Bayes, Linear SVM, Logistic Regression, and Random Forest

Understanding Transformers: A Deep Dive with PyTorch

Data Preparation for Fine-Tuning LLMs (Large Language Models) using Google Colab

OpenSearch with AI

Can We Really Hand-Engineer Level 2+ AGI?

Fine-Tuning Made Easy: The Game-Changing Benefits of LoRA for Language Models

Evolution of Word Embeddings: A Journey Through NLP History

Explore topics