Introduction
In today's fast-paced digital world, staying updated with the latest AI news and developments can be overwhelming. This tutorial will show you how to build an intelligent agent system that automatically scrapes web content and generates concise summaries, perfect for newsletters or content curation.
What You'll Learn
- How to set up a multi-agent system using CrewAI
- How to create specialized agents for web scraping and content writing
- How to process and summarize web content using OpenAI
- How to handle multiple URLs and generate cohesive summaries
Prerequisites
Before we begin, make sure you have:
pip install crewai python-dotenv openai
- Python 3.7 or higher installed
- An OpenAI API key
- Basic understanding of Python
Project Structure
First, let's set up our project structure:
newsletter_agent/
├── .env # Environment variables
├── main.py # Main script
├── requirements.txt # Dependencies
└── tools/
└── scraper_tools.py # Web scraping utilities
Step 1: Environment Setup
Create a .env file to store your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
Step 2: Creating the Scraper Tool
Create the scraper tool in tools/scraper_tools.py
:
import requests
from bs4 import BeautifulSoup
from langchain.tools import BaseTool
class ScraperTool(BaseTool):
name = "Website Scraper"
description = "Scrapes text content from websites"
def scrape(self, url: str) -> str:
try:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text content
text = soup.get_text()
# Clean up text
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
return text
except Exception as e:
return f"Error scraping {url}: {str(e)}"
Step 3: Building the Newsletter Crew
Here's our main script that creates and manages the AI agents:
import os
from crewai import Agent, Task, Crew
from textwrap import dedent
from dotenv import load_dotenv
from tools.scraper_tools import ScraperTool
# Load environment variables
load_dotenv()
class NewsletterCrew:
def __init__(self, urls):
self.urls = [urls] if isinstance(urls, str) else urls
self.scrape_tool = ScraperTool().scrape
def create_agents(self):
scraper = Agent(
role='Website Content Scraper',
goal='Scrape and extract text content from provided URLs',
backstory="Expert at extracting and processing web content",
tools=[self.scrape_tool],
verbose=True
)
writer = Agent(
role='AI Content Summarizer',
goal='Create concise, engaging summaries of AI-related content',
backstory="Skilled content creator specializing in AI topics",
verbose=True
)
return scraper, writer
def run(self):
scraper, writer = self.create_agents()
tasks = [
Task(
description=f"Scrape content from: {', '.join(self.urls)}",
agent=scraper
),
Task(
description="Create an engaging summary of the AI content",
agent=writer
)
]
crew = Crew(
agents=[scraper, writer],
tasks=tasks,
verbose=2
)
return crew.kickoff()
Step 4: Using the Newsletter Agent
Here's how to use the newsletter agent:
from newsletter_crew import NewsletterCrew
# Initialize with URLs
urls = [
"https://example.com/ai-article1",
"https://example.com/ai-article2"
]
# Create and run the crew
newsletter_crew = NewsletterCrew(urls)
summary = newsletter_crew.run()
print(summary)
Step 5: Customization Options
You can enhance this system in several ways:
- Content Filtering: Add filters to focus on specific topics or keywords
- Output Formatting: Customize the summary format for different platforms
- Additional Agents: Add agents for fact-checking or content categorization
- Error Handling: Implement retry mechanisms and better error handling
Best Practices
- Always use environment variables for API keys
- Implement proper error handling for web scraping
- Consider rate limiting for multiple URLs
- Cache results when possible to avoid redundant API calls
- Validate URLs before processing
Final Thoughts
Building an AI newsletter agent with CrewAI and OpenAI provides a powerful way to automate content curation and summarization. This system can be extended and customized to handle various use cases, from personal news digests to professional content curation.
The complete code for this project is available on our GitHub repository.