How to Configure the Web Page Collector

Learn how to use the Web Page Collector in DigitalStakeout XTI to capture, archive, and analyze targeted URLs and web content.

How to Configure the Web Page Collector

The Web Page Collector in DigitalStakeout XTI allows you to collect, archive, and monitor web pages of interest on a scheduled or ad hoc basis. This tool is ideal for tracking high-risk URLs, documenting profile changes, capturing digital evidence, or watching for defacements and content changes.


Use Cases

  • 📸 Evidence Preservation — Capture time-stamped screenshots and raw content
  • 👁️ Profile Monitoring — Archive changes to a person’s or organization’s online footprint
  • 🧪 Threat Monitoring — Track updates to suspicious infrastructure or foreign-controlled sites
  • 🛡️ Executive Protection — Log public-facing mentions, threats, or impersonations
  • 📂 Legal or Compliance Review — Maintain snapshots of critical content for review or audit

Setting Up a Web Page Collector

1. General Configuration

FieldDescription
StatusEnable or disable the collector
Monitor NameAssign a meaningful name (e.g., "Threat Actor Blog Tracker")
Expires OnSet an optional end date for collection
Use CaseSelect from options like Threat Intel, Evidence Logging, etc.
TagsApply project, subject, or entity tags
Send Data ToChoose the destination (folder, integration, or alerts)

2. URL Collection Settings

SettingDescription
Target URL(s)Enter one or more URLs you want to monitor or archive
Screenshot ModeEnable visual capture for legal or archival purposes
Render SettingsChoose whether to execute JavaScript (headless browser rendering)
Collection FrequencySchedule the crawl interval (hourly, daily, weekly, etc.)
Trigger On MatchOptionally trigger alerts when content contains specific keywords

Monitoring Workflow

Each time a URL is captured:

  • The full page content (HTML and rendered text) is stored
  • A screenshot (if enabled) is archived
  • Metadata including title, source URL, date/time, and matched terms are indexed
  • Content is searchable and linked to the associated monitor inside XTI

Best Practices

  • ✅ Use for sites likely to change or delete content (social media, extremist blogs, doxxing forums)
  • 🎯 Use keyword triggers when you're tracking narrative shifts or high-risk terms
  • 🕒 Schedule frequency based on volatility of the source (e.g., hourly for fast-moving targets)
  • 🗂 Combine with Keyword Feed or Profile Tracker for full context
  • 🛑 Avoid over-crawling stable or irrelevant pages to preserve quota and reduce noise

Example Scenarios

  • Capture all profile changes on a suspected sock puppet account over a 30-day period
  • Archive the contents of a takedown target before it’s removed
  • Document changes in wording on a misinformation website promoting health conspiracies
  • Track emerging content themes from an activist or cybercrime blog

🔗

Want a detailed overview of rendering options, screenshot support, and data integration?
Visit the Web Page Collector feature page on our main site.