The Internet is a huge, open (usually) collection of data that can be an extremely valuable tool for analysts, investigators, and researchers. Understanding the basics of internet investigations can take you very far in helping you answer questions and solve problems.
Let’s start by looking at how the Internet is divided up.
Social media is an invention of the modern web, or as older folks like to call it, Web 2.0.
Web 2.0 is a collaborative place where online activity has evolved in communication and sharing. It is so prolific and revealing that investigators love looking at public social profiles for content.
There are all kinds of information to glean from social media such as:
- Known associates, friends, and family
- Likes, causes, passions
- History of events
- Photo and video content
- Reverse image searches
- Links to other public web content such as other social accounts and blogs
- Such as forums, chat profiles (IRC for example)
The Deep Web
There are over a billion websites and even more indexed pages available through search engines. Would it surprise you to know that this is only a fraction of the websites and pages that are actually out there?
That amount of websites and pages may make up 12% of the whole searchable web.
Most people confuse the deep web with the dark web. The deep web is a huge amount of websites that are in public reach that haven’t been indexed by a search engine.
Update 11/1/17: there’s a great writeup by Daniel Miessler on the differences in this topic.
If you created a new website, you would fall into this category. Websites that are new or websites that have been around but haven’t been indexed can be reachable by using specific deep web search tools, databases, and techniques.
The Dark Web
Users who are in the dark web specifically and intentionally hide their websites from view.
Yes, it’s true that criminals operate and all manner of unsavory content appears in the dark web. But before you raise your pitchforks and chant to abolish the dark web so we’re all safer, know that your company HR portal, the one where you have to be in the office to download your paystub, is considered dark web.
Makes sense though right? Why would you want your Intranet publicly exposed?
There are a few techniques in hiding your website in the dark web. For example, indexing robots, like the ones used at Google, can be fooled or otherwise told not to index certain pages.
There are limitations of what can be searched and how big of a footprint you make, but it’s still possible to find dark websites.
General Internet Investigations
This is essentially website research and investigation.
The Archived Web
Do you remember when people told you stuff on the Internet can’t be deleted and is online forever? This is partially why.
Even though web content can be edited or deleted, other websites and services such as The Internet Archive allow you to see content as it was in specific time periods. It’s like a nerd’s virtual time machine.
Even though looking at a website’s history is useful for investigating website compromise, it’s actually kind of fun to look up old websites.
I may publish an article looking at earlier versions of the most popular web services for a good laugh.
Archive Web Providers
This section was updated 5/2/19.
- Google Cache – use
cache:before the full URL or append full URL to
- WayBack Machine – this is referring to the Internet Archive (web.archive.org)
- WebCite – you can also append full URL to
Safe Site Links
If a website does not have a history or doesn’t appear in the archived web, you may need to check reputation sites to see who else has been dinged. Ideally, you want something to go on but having a clean scan doesn’t entirely mean everything is on the up and up.
|Scanner||Clean Scan Message|
|Google Safe Browsing||Not currently listed as suspicious|
|McAfee SiteAdvisor||Didn’t find any problems|
|Norton Safe Web||Found no issues with this site|
|Sucuri SiteCheck||Verified clean / Not blacklisted|
|AVG ThreatLabs||No active threats were reported|
Update: AVG ThreatLabs has been discontinued. I linked to their web safety guide instead.
Let me know what tools you like to use for internet investigations in the comments below!