Introduction

In today's fast-paced digital world, staying updated with the latest AI news and developments can be overwhelming. This tutorial will show you how to build an intelligent agent system that automatically scrapes web content and generates concise summaries, perfect for newsletters or content curation.

What You'll Learn

  • How to set up a multi-agent system using CrewAI
  • How to create specialized agents for web scraping and content writing
  • How to process and summarize web content using OpenAI
  • How to handle multiple URLs and generate cohesive summaries

Prerequisites

Before we begin, make sure you have:

pip install crewai python-dotenv openai
  • Python 3.7 or higher installed
  • An OpenAI API key
  • Basic understanding of Python

Project Structure

First, let's set up our project structure:

newsletter_agent/
├── .env                  # Environment variables
├── main.py              # Main script
├── requirements.txt     # Dependencies
└── tools/
    └── scraper_tools.py # Web scraping utilities

Step 1: Environment Setup

Create a .env file to store your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Step 2: Creating the Scraper Tool

Create the scraper tool in tools/scraper_tools.py:

import requests
from bs4 import BeautifulSoup
from langchain.tools import BaseTool

class ScraperTool(BaseTool):
    name = "Website Scraper"
    description = "Scrapes text content from websites"

    def scrape(self, url: str) -> str:
        try:
            response = requests.get(url)
            soup = BeautifulSoup(response.text, 'html.parser')
            
            # Remove script and style elements
            for script in soup(["script", "style"]):
                script.decompose()
            
            # Get text content
            text = soup.get_text()
            
            # Clean up text
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = ' '.join(chunk for chunk in chunks if chunk)
            
            return text
        except Exception as e:
            return f"Error scraping {url}: {str(e)}"

Step 3: Building the Newsletter Crew

Here's our main script that creates and manages the AI agents:

import os
from crewai import Agent, Task, Crew
from textwrap import dedent
from dotenv import load_dotenv
from tools.scraper_tools import ScraperTool

# Load environment variables
load_dotenv()

class NewsletterCrew:
    def __init__(self, urls):
        self.urls = [urls] if isinstance(urls, str) else urls
        self.scrape_tool = ScraperTool().scrape

    def create_agents(self):
        scraper = Agent(
            role='Website Content Scraper',
            goal='Scrape and extract text content from provided URLs',
            backstory="Expert at extracting and processing web content",
            tools=[self.scrape_tool],
            verbose=True
        )

        writer = Agent(
            role='AI Content Summarizer',
            goal='Create concise, engaging summaries of AI-related content',
            backstory="Skilled content creator specializing in AI topics",
            verbose=True
        )

        return scraper, writer

    def run(self):
        scraper, writer = self.create_agents()
        
        tasks = [
            Task(
                description=f"Scrape content from: {', '.join(self.urls)}",
                agent=scraper
            ),
            Task(
                description="Create an engaging summary of the AI content",
                agent=writer
            )
        ]

        crew = Crew(
            agents=[scraper, writer],
            tasks=tasks,
            verbose=2
        )

        return crew.kickoff()

Step 4: Using the Newsletter Agent

Here's how to use the newsletter agent:

from newsletter_crew import NewsletterCrew

# Initialize with URLs
urls = [
    "https://example.com/ai-article1",
    "https://example.com/ai-article2"
]

# Create and run the crew
newsletter_crew = NewsletterCrew(urls)
summary = newsletter_crew.run()

print(summary)

Step 5: Customization Options

You can enhance this system in several ways:

  • Content Filtering: Add filters to focus on specific topics or keywords
  • Output Formatting: Customize the summary format for different platforms
  • Additional Agents: Add agents for fact-checking or content categorization
  • Error Handling: Implement retry mechanisms and better error handling

Best Practices

  • Always use environment variables for API keys
  • Implement proper error handling for web scraping
  • Consider rate limiting for multiple URLs
  • Cache results when possible to avoid redundant API calls
  • Validate URLs before processing

Final Thoughts

Building an AI newsletter agent with CrewAI and OpenAI provides a powerful way to automate content curation and summarization. This system can be extended and customized to handle various use cases, from personal news digests to professional content curation.

The complete code for this project is available on our GitHub repository.

Relevant Hashtags

#AIAgent #CrewAI #OpenAI #WebScraping #ContentAutomation #Python #ArtificialIntelligence #NewsletterAutomation #ContentCuration #AutomatedSummary