Open source intelligence — OSINT — is one of the most powerful and widely misunderstood disciplines in cybersecurity. It requires no hacking tools, no exploits, and no illegal access. Everything happens in plain sight. The question is whether you know where to look.

This guide covers OSINT from first principles to advanced methodology — the way security professionals actually use it, not the oversimplified version you find in most tutorials.


What OSINT Actually Is

OSINT is the collection and analysis of information from publicly available sources to produce actionable intelligence. The term originated in military and intelligence communities — the CIA, GCHQ, and NATO all have dedicated OSINT units — and has since become a core discipline in cybersecurity, investigative journalism, law enforcement, and corporate due diligence.

The key word is publicly available. OSINT does not involve accessing private systems, intercepting communications, or any form of unauthorised access. Everything covered in this guide operates entirely within legal boundaries when used responsibly.

Ethical Boundary: OSINT techniques must only be used on yourself, your organisation, or targets you have explicit written authorisation to research. Unauthorised collection and profiling of private individuals may violate GDPR (UK/EU), CCPA (California), CFAA (US), and Computer Misuse Act 1990 (UK).


The OSINT Framework

Before touching any tool, understand the methodology. OSINT professionals follow a structured process:

Phase 1 — Define Your Intelligence Requirement

What exactly do you need to know? Define your target and your objective before starting. Undirected collection produces noise, not intelligence.

Examples of clear intelligence requirements:

  • What is the external attack surface of [organisation]?
  • What personal information about [individual] is publicly accessible?
  • What technologies does [company] use in their infrastructure?

Phase 2 — Passive Reconnaissance

Collect information without directly interacting with the target. No packets sent, no footprint left.

Phase 3 — Active Reconnaissance

Interact with public-facing infrastructure to gather additional data. This leaves traces and requires explicit authorisation.

Phase 4 — Analysis & Correlation

Raw data is not intelligence. Analysis — finding connections, confirming accuracy, assessing reliability — is where OSINT becomes valuable.

Phase 5 — Reporting

Document findings clearly with source attribution, confidence levels, and recommendations.


Beginner Level — Starting Points

Google Dorking

Google’s advanced search operators are among the most underutilised OSINT tools available. Used correctly, they surface information that basic searches never return.

Essential operators:

site:target.com              — limits results to a specific domain
filetype:pdf site:target.com — finds specific file types
inurl:admin site:target.com  — finds URLs containing specific strings
intitle:"index of"           — finds exposed directory listings
"@target.com" filetype:xls   — finds spreadsheets containing email addresses

Real example: site:target.com filetype:pdf "confidential" frequently surfaces internal documents accidentally published to public web servers.

Official reference: Google Search Operators Documentation

WHOIS Lookup

Every registered domain has a WHOIS record containing registration data. Even with privacy protection enabled, historical records often contain real contact details.

Tools:

  • who.is — clean UI, historical records
  • domaintools.com — professional-grade, tracks changes over time
  • whois target.com — command line, fastest for bulk queries

What you can find: registrant name, email, organisation, registration date, name servers, registrar.

Certificate Transparency Logs

Every TLS certificate issued is logged publicly. This reveals subdomains — often including staging environments, internal tools, and forgotten infrastructure.

# Using crt.sh — free, no account required
curl "https://crt.sh/?q=%.target.com&output=json" | jq '.[].name_value' | sort -u

Or simply visit: https://crt.sh/?q=%.target.com

This consistently surfaces subdomains that DNS brute-forcing misses entirely.


Intermediate Level — Infrastructure Intelligence

Shodan

Shodan is a search engine for internet-connected devices. It continuously scans the entire internet and indexes banners, service versions, open ports, and SSL certificates.

What security professionals use it for:

  • Finding exposed databases (MongoDB, Elasticsearch, Redis)
  • Identifying outdated software versions in production
  • Mapping an organisation’s full external attack surface
  • Finding industrial control systems (ICS/SCADA) exposed to the internet
# Shodan search examples
org:"Target Company Ltd"          — all assets belonging to an organisation
ssl:"target.com"                  — assets with target.com in SSL certificate
port:3389 country:GB              — exposed RDP servers in the UK
product:"Apache httpd" version:2.2 — vulnerable Apache versions

Official resource: Shodan Documentation

According to CISA’s 2025 advisory on internet-exposed assets, exposed management interfaces remain one of the top vectors for initial access in ransomware attacks.

theHarvester

A purpose-built OSINT tool for gathering emails, subdomains, and employee names from public sources.

# Installation
pip install theHarvester

# Basic usage
theHarvester -d target.com -b google,bing,linkedin,shodan -l 500

Sources it queries: Google, Bing, LinkedIn, Hunter.io, Shodan, and more. Produces a consolidated list of email addresses, names, and infrastructure details.

Maltego

Maltego is the professional standard for OSINT data correlation. It visualises relationships between entities — people, organisations, domains, IP addresses, social accounts — through an interactive graph.

Community edition is free. Professional licence is used by law enforcement agencies, corporate investigators, and penetration testers worldwide.

Key transforms for security research:

  • Domain → DNS records → IP addresses → hosting provider
  • Email address → breached credentials → associated accounts
  • Organisation → employees → LinkedIn profiles → personal emails

Advanced Level — Social Media Intelligence (SOCMINT)

Metadata Extraction from Images

Every image taken on a smartphone contains EXIF metadata — which can include GPS coordinates, device model, software version, and timestamp.

# Using ExifTool — the standard for metadata extraction
exiftool image.jpg

# Extract GPS coordinates specifically
exiftool -gpslatitude -gpslongitude image.jpg

This is how journalists and investigators have geolocated conflict zone images, verified the location of leaked documents, and identified the devices used to capture specific photographs.

Defensive note: Strip EXIF data before publishing any images using tools like ExifTool or online services like exifpurge.com.

Reverse Image Search — Beyond Google

Google reverse image search is the starting point, but advanced OSINT uses multiple engines simultaneously:

Tool Strength
TinEye Tracks image across time — finds first appearance
Yandex Images Superior facial recognition for SOCMINT
Bing Visual Search Strong for product and location identification
PimEyes Face search engine — powerful but use ethically

Breach Data Intelligence

Have I Been Pwned (HIBP) is the authoritative public resource for checking whether an email address appears in known data breaches.

For security professionals conducting authorised assessments, breach data reveals credential reuse patterns — one of the most dangerous vulnerabilities in any organisation.

Reference: Troy Hunt’s HIBP database currently contains over 14 billion breached records. NCSC UK recommends HIBP as part of password security policy.


OSINT Tool Reference — Complete List

Category Tool Free Notes
Search Google Dorking Master this first
DNS ViewDNS.info Historical DNS records
Certificates crt.sh Subdomain enumeration
Infrastructure Shodan Partial Essential for professionals
Email OSINT theHarvester Open source
Social Maltego CE Community edition
Image ExifTool Metadata extraction
Breaches HaveIBeenPwned Industry standard
Geolocation GeoGuessr Pro Partial Image geolocation training
Dark Web OnionScan Research use only

Defending Against OSINT

Understanding OSINT is only half the picture. The other half is protecting your organisation from being the target.

For organisations:

  1. Conduct regular attack surface reviews using the tools above
  2. Audit what employee information is publicly visible on LinkedIn
  3. Implement strict email format policies to reduce harvesting
  4. Remove sensitive metadata from all published documents and images
  5. Monitor certificate transparency logs for unexpected subdomains
  6. Use Google Alerts for your organisation name and key personnel

For individuals:

  1. Audit your own digital footprint quarterly
  2. Enable privacy protection on domain registrations
  3. Strip metadata from images before publishing
  4. Review and restrict social media visibility settings
  5. Check yourself on HIBP regularly

Official Resources & References


Conclusion

OSINT is not a collection of hacking tricks. It is a disciplined intelligence methodology that, when applied rigorously and ethically, gives security professionals a complete picture of what an attacker can learn about a target before launching an attack.

Master the basics — Google dorking, WHOIS, certificate transparency — before moving to advanced tools. The most impactful OSINT finds consistently come from the simplest techniques applied thoughtfully.

The next article in this series covers Social Engineering Attack Patterns — how attackers use OSINT intelligence to build targeted phishing and pretexting campaigns.


Have corrections, additions, or want to share your own OSINT research? Get in touch.