Revealer Summarization: Transforming Documents into Actionable Insights
Rafia Anis
Rafia Anis
Sr. Developer, Revealer
December 2025
Share on LinkedIn
Executive Summary
In an era of information overload, businesses and professionals are struggling to process large volumes of unstructured text efficiently. Traditional summarization methods often fail to capture key insights while maintaining context.
This white paper introduces a comprehensive AI-driven summarization workflow, where:
- ✅Users upload documents (PDFs, Word files, PPT, text files, .md files, etc.)
- ✅Documents are converted to markdown using the Microsoft MarkItDown plugin.
- ✅Counting tokens using the tiktoken.
- ✅AI-powered summarization techniques—Stuff, Map-Reduce, and Refine—generate structured, concise summaries.
By automating this workflow, organizations can extract key insights, improve decision-making, and save significant time when analyzing reports, research papers, customer feedback, and legal documents.
Introduction: The Need for AI-Driven Summarization
The explosion of digital content has made it increasingly difficult for businesses, researchers, and professionals to manually process large amounts of text. Whether it's legal contracts, research papers, or financial reports, reading and summarizing documents takes time and effort.
The Challenges of Traditional Summarization
- 🚫Manual summarization is slow and inconsistent
- 🚫Key information may be lost due to human bias
- 🚫Extractive summarization often lacks coherence
- 🚫Large language models (LLMs) struggle with long documents without structuring inputs properly
To address these challenges, our AI-driven summarization pipeline automates document processing, converts text into markdown, counts the token and applies a multi-step summarization approach using Stuff, Map-Reduce, and Refine methods.
Step 1: Document Upload & Preprocessing
Users can upload documents in various formats:
- 📄 PDF (.pdf)
- 📝 Word (.docx, .doc)
- 📜 Plain Text (.txt, .md)
- 📊 PPT (.pptx)
These files are parsed and converted to markdown using the Microsoft MarkItDown plugin, ensuring clean, structured text for AI processing.
Why Markdown?
- • Preserves structure (headings, lists, tables)
- • Improves AI readability
- • Reduces unnecessary formatting noise
Why Tiktoken?
tiktoken is a fast BPE tokenizer for use with OpenAI's models. Once converted, the markdown text is ready for token count.
Step 2: AI Summarization Using Stuff, Map-Reduce, and Refine Techniques
Based on the token count we can select the summarization technique.
The Three-Phase Summarization Process
To efficiently summarize markdown documents, we use three complementary AI-driven techniques:
1. Stuff Method
- ✅Best for short documents
- ✅Directly feeds the entire markdown text into an LLM for summarization
- ✅Fast and efficient, but less effective for large documents
2. Map-Reduce Method
- ✅Best for large documents
- ✅Breaks markdown into smaller sections, summarizes each separately (Map phase)
- ✅Merges and refines individual summaries into a final version (Reduce phase)
- ✅More scalable than Stuff for lengthy reports
3. Refine Method
- ✅Best for high-accuracy summarization
- ✅Summarizes incrementally by iteratively refining a draft summary
- ✅Ensures better coherence and context retention
- ✅Useful for technical, legal, and research documents
By combining these techniques, we balance efficiency and accuracy across different document sizes and complexities.

Figure: Revealer Summary workflow from document upload to summary output
Step 3: Real-World Applications of AI Summarization
- 🚀Business Intelligence & Market Research – Quickly extract key insights from industry reports and competitor analysis.
- 📚Academic Research – Summarize research papers and generate concise literature reviews.
- 📝Legal & Compliance – Process and summarize contracts, regulations, and policy documents.
- 📞Customer Support & Feedback – Analyze and condense support tickets, reviews, and survey responses.
- 🏥Healthcare & Medical – Summarize patient records, research papers, and clinical reports for quick decision-making.
Code Snippets
1. Microsoft MarkItDown
Install MarkItDown Plugin:
pip install markitdown
Python code:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("test.pdf")
print(result.text_content)
2. Tiktoken
Install Tiktoken:
pip install tiktoken
Python code:
import tiktoken
enc = tiktoken.get_encoding("o200k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"
# To get the tokenizer corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = encoding.encode('text')
token_count = len(tokens)
print(f"Token count: {token_count}")
3. Summarization Techniques
Stuff Method
# Stuff method - for short documents
chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.run(docs)
Map-Reduce Method
map_reduce_chain = load_summarize_chain(
llm,
chain_type="map_reduce",
return_intermediate_steps=False,
)
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)
print(map_reduce_chain.run(split_docs))
Refine Method
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)
refine_template = (
"Your job is to produce a final summary\n"
"We have provided an existing summary up to a certain point: {existing_answer}\n"
"We have the opportunity to refine the existing summary"
"(only if needed) with some more context below.\n"
"------------\n"
"{text}\n"
"------------\n"
"Given the new context, refine the original summary."
"If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
llm=llm,
chain_type="refine",
question_prompt=prompt,
refine_prompt=refine_prompt,
return_intermediate_steps=True,
input_key="input_documents",
output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)
print(result["output_text"])

Figure: Revealer Summary interface

Figure: AI-generated summary output
Why This Summarization Pipeline is a Game-Changer
Key Features
- ✅Automation – Eliminates manual summarization, saving hours of work
- ✅Scalability – Handles both short and long documents efficiently
- ✅Customization – Users can choose the summarization method based on document size and complexity
- ✅Accuracy & Context Retention – Refine method ensures high-quality summaries

Figure: Revealer Personal Space with summarized documents
Conclusion
AI-powered summarization is revolutionizing the way we process and extract insights from text. By combining markdown conversion and advanced Revealer summarization techniques, businesses and professionals can save time, improve efficiency, and enhance decision-making.
🚀 How do you currently summarize documents? Would Revealer-powered summarization improve your workflow? Let's discuss!
#RevealerAI
#Product
#AI
#Summarization
#MachineLearning
#Productivity
#BusinessAutomation
Ready to Transform Your Document Workflow?
See how Revealer can automate your document summarization and save hours of manual work.
Transform your enterprise with agentic AI platform that automate operations and unlock intelligent business outcomes
Product
Company
Resources
Personas
© 2026 Revealer. All rights reserved.