The Dark and Mysterious - Deep Web
As Wikipedia says “The Deep Web (also called the Deepnet , Invisible Web , or Hidden Web) is a World Wide Web content that is not a part of the Surface Web”. To really understand the differences between the deep and surface web, we have to know how search engines like Google, Yahoo, Bing, etc. work. Search engines obtain their listings in two ways: authors may submit their own web pages, or search engines use softwares that "crawl" or "spider" web pages by following one hypertext link to another. The latter returns a bulk of the listings. Crawlers work by recording every hypertext link in every page they index crawling. Like ripples propagating across a pond, search-engine crawlers are able to extend their indexes further from their starting points.
So what is surface web? The surface web is the usual web, what we surf in our day to day life. The websites that can be accessed directly or are listed by a search engine like Google, Yahoo, Bing, etc. and don’t require any proxy service to connect, are a part of surface web. The sites that are not indexed by search engines are hosted on non HTTP(S) protocol and require a proxy service like I2P, freenet, Tor, JonDo, etc. are a part of Deep Web (Figure 1).
Search engines cannot access the Deep web because most of the web pages served in them are dynamically generated, have unlinked content (many pages do not have links to other pages), store text information on FTP sites, Internet Relay Chats (IRC), restrict access to content using the Robots Exclusion Standard, CAPTCHAs, or password protected (registration and login).
Before HTTP protocol was standardized, servers used Gopher protocol to serve its clients information in simple text through terminals. Although its use is largely discontinued among modern browsers, Gopher sites still exist and host content.
Study based on data collected in March 2000:
v Public information on the deep web is currently 400 to 550 times larger than the commonly defined World Wide Web.
v The deep web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web.
v The deep web contains nearly 550 billion individual documents compared to the one billion of the surface web.
v More than 200,000 deep web sites presently exist.
v Sixty of the largest deep web sites collectively contain about 750 terabytes of information —sufficient by themselves to exceed the size of the surface web forty times.
v On average, deep web sites receive fifty per cent greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep web site is not well known to the Internet-searching public.
v The deep web is the largest growing category of new information on the Internet.
v Deep web sites tend to be narrower, with deeper content, than conventional surface sites.
v Total quality content of the deep web is 1,000 to 2,000 times greater than that of the surface web.
v Deep web content is highly relevant to every information need, market, and domain.
v More than half of the deep web content resides in topic-specific databases.
v A full ninety-five per cent of the deep web is publicly accessible information — not subject to fees or subscriptions.
About 96 percent of the Internet is beyond search engines. Now the question is what exactly is on Deep Web? While surfing the Web, you are really just floating on the surface. Deep web contains thousands of terabytes of information; they include everything from boring statistics to human body parts for sale. The immense majority of the Deep Web holds pages with potentially valuable information. While most people talk about the drugs and arms markets, illicit services, banned pornography, pirated goods, etc., a part of Deep Web is also used by government agencies for covert communication and by activists, rebels, journalists around the world to voice their opinions on various incriminating topics, most notably are the Chinese, Iranian, Syrian, Arab spring, etc. reporting on their respective government's corrupt and authoritarian practices.
In the media however, the most discussed Deep Web site is the Silk Road. The Silk Road is a market place for all sorts of illegal substances and services, it is said to have been involved in about 1.2 billion dollars’ worth of transactions making 80 million as commission. Bitcoin is the most dominant mode of currency in the Deep Web, an online “crypto-currency” that lets buyers and sellers trade anonymously. The nature of Deep Web makes it difficult for law enforcement agencies to track down users involved in illicit transactions. Although law enforcement agencies have had some success in shutting down a few of such sites, it has hardly made a dent in the Deep Web.
A report in 2001, the best till date, estimates 54% of onion sites are actual databases, among the world’s largest are of U.S. National Aeronautics and Space Administration (NASA), the Patent and Trademark Office, and the Securities and Exchange Commission's EDGAR search system all of which are public. The next batch has pages kept private by companies that charge a fee to access them, such as government documents on LexisNexis and Westlaw or the academic journals on Elsevier. Another 13% of pages lie hidden because they're usually found on Intranet. These internal networks, say at corporations or universities have access to message boards, personnel files or industrial control panels that can flip a light switch or shut down the whole power plant.
The Deep web is a double edged sword for governments, while it allows them to securely and covertly communicate the same aspect has also been used by their adversaries. Hacktivists, cyber criminals, and other such elements also enjoy the benefits offered by the Deep Web. This forced governments to heavily invest in cyber monitoring tools and agencies. Some of the most famous agencies and their mass electronic surveillance projects are U.S. government's NSA (PRISM), British government's GCHQ (Tempora), and Indian government's DRDO (Netra). Forced by such government agencies the local ISPs are roped into divulging all details about their clients.
Although controversial, most governments end up with some sort of mass electronic surveillance programs in the name of national security. It is now public knowledge that some private firms offer the necessary hardware and software equipment to perform mass surveillance. To what extent this is actually used against genuine enemies of the state and not against its own citizens can only be a guess, thus giving rise to open source high encryption proxies.
So how do you access the Deep Web? Depending on what you are trying to access in the Deep Web there are four major proxy services Tor, freenet, I2P and JonDo. These proxy services will allow you to connect to their respective proxy networks.
For a beginner Tor is a good place to start, it is simple and relatively faster. You can connect to the Tor network using Tor Browser Bundle suite. A simple and very insecure alternative to this is to replace the “onion” part with “tor2web.org”, for example: http://xyz.onion becomes http://xyz.tor2web.org and can be used in your regular browser without Tor running in the background; however this must only be the last resort. And if you really want to be anonymous, then it is advised to use Tails or Liberté live operating system. It has Tor inbuilt and a host of other features that have only one thing in mind, anonymity, leaving no trace of your activities on your computer unlike most regular browsers.
Another important advantage of using such an operating system for exploring the Deep web is protection from malware. Most Internet security software and online (blacklists) or browser based malware protection systems have little or no meaning in the Deep Web.
Here is a link to a list of .onion sites http://pastebin.com/v5Yq66sH
Another way of understanding the Deep Web could be using levels; the Deep Web can be divided into 8 levels. The first level, aka surface web, is the Internet most of us already know about, directly accessible and indexed by search-engines. The second level is the Internet that is not indexed by search-engines, directly accessible and no proxy required.
The third level onwards Deep Web starts, no search-engines are able to index these sites and they need some sort of proxy network like Tor, I2P, freenet or JonDo to become accessible. Although this is the Deep Web, most content on level 3 is publicly accessible (with proxy) without any sort of restrictions.
In the fourth level aka “Charter web” sites become more restrictive and begin using stronger security measures like registration & login, invite only memberships, open only for a specific time and/or dates, restricted to certain IP addresses, or a combination of the above, etc. Unlike most websites, they are not interested in maximizing traffic hits and keep a very low profile even in the Deep Web. This level has an even deeper level within it, Closed Shell Systems (CSS).
These comprise of a single computer or a network of systems that are not connected to any external network at all. They can only be accessed from within the network. It is not possible to connect to these systems unless the attacker can physically access these systems. Many companies have sensitive internal networks that are behind a firewall (green zone), that is a different scenario and is still considered insecure in this context. The CSS networks have no physical (wired or wireless) connections to any other network. What these types of networks contain is left to the reader's imagination.
The levels 5 to 8 are only imaginative and are extremely difficult (if not impossible) to verify. There are many inconclusive explanations of what resides here, quantum computers to ultimate control over the Internet.
About the Author
“hackerDesk” is a security research group based in India that has published exploits and bugs on exploit-db, packet storm, security focus, osvdb, and has over 50 bug bounties to their credit. Facebook, Barracuda, Yahoo, Twitter, Ebay, Microsoft, Nokia *Top Reporter*, Adobe, and Sony are just some of the high profile organizations they have found major security vulnerabilities in.