How to Configure the Web Page Collector
Learn how to use the Web Page Collector in DigitalStakeout XTI to capture, archive, and analyze targeted URLs and web content.
How to Configure the Web Page Collector
The Web Page Collector in DigitalStakeout XTI allows you to collect, archive, and monitor web pages of interest on a scheduled or ad hoc basis. This tool is ideal for tracking high-risk URLs, documenting profile changes, capturing digital evidence, or watching for defacements and content changes.
Use Cases
- 📸 Evidence Preservation — Capture time-stamped screenshots and raw content
- 👁️ Profile Monitoring — Archive changes to a person’s or organization’s online footprint
- 🧪 Threat Monitoring — Track updates to suspicious infrastructure or foreign-controlled sites
- 🛡️ Executive Protection — Log public-facing mentions, threats, or impersonations
- 📂 Legal or Compliance Review — Maintain snapshots of critical content for review or audit
Setting Up a Web Page Collector
1. General Configuration
Field | Description |
---|---|
Status | Enable or disable the collector |
Monitor Name | Assign a meaningful name (e.g., "Threat Actor Blog Tracker") |
Expires On | Set an optional end date for collection |
Use Case | Select from options like Threat Intel, Evidence Logging, etc. |
Tags | Apply project, subject, or entity tags |
Send Data To | Choose the destination (folder, integration, or alerts) |
2. URL Collection Settings
Setting | Description |
---|---|
Target URL(s) | Enter one or more URLs you want to monitor or archive |
Screenshot Mode | Enable visual capture for legal or archival purposes |
Render Settings | Choose whether to execute JavaScript (headless browser rendering) |
Collection Frequency | Schedule the crawl interval (hourly, daily, weekly, etc.) |
Trigger On Match | Optionally trigger alerts when content contains specific keywords |
Monitoring Workflow
Each time a URL is captured:
- The full page content (HTML and rendered text) is stored
- A screenshot (if enabled) is archived
- Metadata including title, source URL, date/time, and matched terms are indexed
- Content is searchable and linked to the associated monitor inside XTI
Best Practices
- ✅ Use for sites likely to change or delete content (social media, extremist blogs, doxxing forums)
- 🎯 Use keyword triggers when you're tracking narrative shifts or high-risk terms
- 🕒 Schedule frequency based on volatility of the source (e.g., hourly for fast-moving targets)
- 🗂 Combine with Keyword Feed or Profile Tracker for full context
- 🛑 Avoid over-crawling stable or irrelevant pages to preserve quota and reduce noise
Example Scenarios
- Capture all profile changes on a suspected sock puppet account over a 30-day period
- Archive the contents of a takedown target before it’s removed
- Document changes in wording on a misinformation website promoting health conspiracies
- Track emerging content themes from an activist or cybercrime blog
Want a detailed overview of rendering options, screenshot support, and data integration?
Visit the Web Page Collector feature page on our main site.
Updated 2 days ago