What Happens When You Type A URL In Your Browser And Press Enter
More than half of the world’s population regularly use the internet. We rely on the internet for just about everything: checking email, calling an Uber, and ordering food and clothes. Yet in all these interactions it can be hard to see how the internet is used and navigated behind the screens.
This article will allow you to understand what goes on when you enter a URL into a browser.
First, let’s discuss the web browser.
The Web Browser
A web browser is a program that most people use to view websites on the Internet. Most modern web browsers like Google Chrome and Mozilla Firefox have many features built into them that hide some of the underlying processes involved in connecting to a web page in order do improve user experience. We will discuss some of these features in later sections.
The client-server model
Before diving into the details of the web infrastructure, it’s important to understand the client-server model. On the World Wide Web, the network is organized between clients, which request data, and servers, which stores the data and manages most of the processing of this data. For example, a browser is considered a client, and a server would be the computer program serving data to that client. Server is also a term describing the physical machine on which the server program is running. Each website, application, service can have multiple servers working behind the scenes to perform the processes needed by the client(s). In general, physical servers are regrouped in server farms, or data centers.
So what is a website !
Website
All websites are files. A bare-bones website is just an HTML file, however, most websites today comprise of several inter-linked files which include JavaScript and CSS.
Websites exist on powerful computers called servers. Servers are usually located somewhere remote from your computer. In order to display the website on your browser, it has to retrieve it from the server. Each server has an address through which it can be found and accessed called an IP address.
Retrieving a website from a server is a lot like retrieving goods from a warehouse. The warehouse is akin to the server because it is remote, has an address just like servers have IP addresses and stores goods just like servers store or host websites.
Every device that is connected to the Internet has an IP address, regardless of whether it is a server or not. Your computer has an IP address too. . Note that all Internet-enabled devices have an IP address but not all of them are servers.
If your browser knows the IP address of a website’s server, it will be able to access it. However, all it knows is the URL.
URL breakdown
http://www.holbertonschool.com/index.html
A Uniform Resource Locator (URL) is used to locate a resource, in our case, the website on a server.
- http:// tells the browser that we want to access a page using the Hyper Text Transfer Protocol (HTTP). This is a protocol that browsers use to interact with web pages. Other protocol have other purposes
- www is a subdomain of
holbertonschool.com,
this part refers to a specific location (server) inside the domain where resources are located. - Domain name: a unique name that identifies a website. Examples include,
facebook.com
,holbertonschool.com
, andgoogle.com
. A domain name specifies which server a resource/website is on. - Path to file: The files on the server have a location just the way your PC has a file system and each file has a location. The URL also tells where the specified resource that is requested is located on the server. The files or resources on the server have locations that are specified in the URL.
You might be wondering why URLs are used anyways. Why not just type IP addresses directly into browsers?
- Humans are generally not very good at memorizing random numbers. Imagine memorizing
73.22.49.2
overwww.Facebook.com
. It’s easy enough for one website but imagine having to do this for all the websites you access in a day. You’d likely end up with the IP addresses written somewhere!- The server hosting the website may change over time which would mean the IP address would too. Keeping track of these changes is not easy or convenient for users.
How does a web browser find an IP address using just a URL?
DNS
A DNS query (also known as a DNS request) is a demand for information sent from a user’s computer (DNS client) to a DNS server. In most cases a DNS request is sent, to ask for the IP address associated with a domain name. An attempt to reach a domain, is actually a DNS client querying the DNS servers to get the IP address, related to that domain.
Each device connected to the Internet has a unique IP address which other machines use to find the device. DNS servers eliminate the need for humans to memorize IP addresses such as 192.168.1.1 (in IPv4), or more complex newer alphanumeric IP addresses such as 2400:cb00:2048:1::c629:d7a2 (in IPv6).
What are the steps in a DNS lookup?
For most situations, DNS is concerned with a domain name being translated into the appropriate IP address. To learn how this process works, it helps to follow the path of a DNS lookup as it travels from a web browser, through the DNS lookup process, and back again. Let’s take a look at the steps.
Note: Often DNS lookup information will be cached either locally inside the querying computer or remotely in the DNS infrastructure. There are typically 8 steps in a DNS lookup. When DNS information is cached, steps are skipped from the DNS lookup process which makes it quicker. The example below outlines all 8 steps when nothing is cached.
The 8 steps in a DNS lookup:
- A user types ‘example.com’ into a web browser and the query travels into the Internet and is received by a DNS recursive resolver.
- The resolver then queries a DNS root nameserver (.).
- The root server then responds to the resolver with the address of a Top Level Domain (TLD) DNS server (such as .com or .net), which stores the information for its domains. When searching for example.com, our request is pointed toward the .com TLD.
- The resolver then makes a request to the .com TLD.
- The TLD server then responds with the IP address of the domain’s nameserver, example.com.
- Lastly, the recursive resolver sends a query to the domain’s nameserver.
- The IP address for example.com is then returned to the resolver from the nameserver.
- The DNS resolver then responds to the web browser with the IP address of the domain requested initially.
- The browser makes a HTTP request to the IP address.
- The server at that IP returns the webpage to be rendered in the browser (step 10).
Protocols: TCP/IP
We mentioned how domain names actually represent IP addresses, but IP is not the only type of protocol use by the Internet. The Internet Protocol Suite is often referred to as TCP/IP (TCP stand for Transmission Control Protocol)
TCP/IP was developed by the U.S. Department of Defense to specify how computers transfer data from one device to another. TCP/IP puts a lot of emphasis on accuracy, and it has several steps to ensure that data is correctly transmitted between the two computers.
Here’s one way it does that. If the system were to send the whole message in one piece, and if it were to encounter a problem, the whole message would have to be re-sent. Instead, TCP/IP breaks each message into packets, and those packets are then reassembled on the other end. In fact, each packet could take a different route to the other computer, if the first route is unavailable or congested.
In addition, TCP/IP divides the different communications tasks into layers. Each layer has a different function. Data goes through four individual layers before it is received on the other end (as explained in the following section). TCP/IP then goes through these layers in reverse order to reassemble the data and to present it to the recipient.
The four layers of the TCP/IP model
TCP/IP is a datalink protocol that is used on the internet. Its model is split into four distinct layers. Used together, they can also be referred to as a suite of protocols.
Datalink layer
The datalink layer (also called the link layer, network interface layer, or physical layer) is what handles the physical parts of sending and receiving data using the Ethernet cable, wireless network, network interface card, device driver in the computer, and so on.
Internet layer
The internet layer (also called the network layer) controls the movement of packets around the network.
Transport layer
The transport layer is what provides a reliable data connection between two devices. It divides the data in packets, acknowledges the packets that it has received from the other device, and makes sure that the other device acknowledges the packets it receives.
Application layer
The application layer is the group of applications that require network communication. This is what the user typically interacts with, such as email and messaging. Because the lower layers handle the details of communication, the applications don’t need to concern themselves with this.
The purpose of the layers is to keep things standardized, without numerous hardware and software vendors having to manage communication on their own. It’s like driving a car: All the manufacturers agree on where the pedals are, so that’s something we can count on between cars. It also means that certain layers can be updated, such as to improve performance or security, without having to upgrade the entire thing.
Firewall
A Firewall is a network security device that monitors and filters incoming and outgoing network traffic based on an organization’s previously established security policies. At its most basic, a firewall is essentially the barrier that sits between a private internal network and the public Internet. A firewall’s main purpose is to allow non-threatening traffic in and to keep dangerous traffic out.
Types of Firewalls
- Packet filtering: A small amount of data is analyzed and distributed according to the filter’s standards.
- Proxy service: Network security system that protects while filtering messages at the application layer.
- Stateful inspection: Dynamic packet filtering that monitors active connections to determine which network packets to allow through the Firewall.
- Next Generation Firewall (NGFW): Deep packet inspection Firewall with application-level inspection.
Why Do We Need Firewalls?
Firewalls, especially Next Generation Firewalls, focus on blocking malware and application-layer attacks. Along with an integrated intrusion prevention system (IPS), these Next Generation Firewalls are able to react quickly and seamlessly to detect and combat attacks across the whole network. Firewalls can act on previously set policies to better protect your network and can carry out quick assessments to detect invasive or suspicious activity, such as malware, and shut it down. By leveraging a firewall for your security infrastructure, you’re setting up your network with specific policies to allow or block incoming and outgoing traffic.
HTTPS/SSL
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security (TLS) or, formerly, Secure Sockets Layer (SSL). The protocol is therefore also referred to as HTTP over TLS or HTTP over SSL.
The principal motivations for HTTPS are authentication of the accessed website, and protection of the privacy and integrity of the exchanged data while in transit. It protects against man-in-the-middle attacks, and the bidirectional encryption of communications between a client and server protects the communications against eavesdropping and tampering.In practice, this provides a reasonable assurance that one is communicating with the intended website without interference from attackers.
SSL Certificate
SSL stands for Secure Sockets Layer and, in short, it’s the standard technology for keeping an internet connection secure and safeguarding any sensitive data that is being sent between two systems, preventing criminals from reading and modifying any information transferred, including potential personal details. The two systems can be a server and a client (for example, a shopping website and browser) or server to server (for example, an application with personal identifiable information or with payroll information).
It does this by making sure that any data transferred between users and sites, or between two systems remain impossible to read. It uses encryption algorithms to scramble data in transit, preventing hackers from reading it as it is sent over the connection. This information could be anything sensitive or personal which can include credit card numbers and other financial information, names and addresses.
Load-balancer
As we mentioned earlier, websites live on servers. For most website where the traffic is consequent, it would be impossible to be hosted on a single server. Plus, it would create a Single Point of Failure (SPOF), because it would only need one attack on said server to take the whole site down.
As needs for higher availability and security rises, websites started augmenting the number of servers they have, organizing them in clusters, and using load-balancers. A load-balancer is a software program that distribute network requests between several servers, following a load-balancing algorithm. HAproxy is a very famous load-balancer, and example of algorithms that we can use are the round-robin, which distributes the requests alternating between all the servers evenly and consequentially, or the least-connection, which distributes requests depending on the current server loads.
A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity (concurrent users) and reliability of applications. They improve the overall performance of applications by decreasing the burden on servers associated with managing and maintaining application and network sessions, as well as by performing application-specific tasks.
Load balancers are generally grouped into two categories: Layer 4 and Layer 7. Layer 4 load balancers act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP). Layer 7 load balancers distribute requests based upon data found in application layer protocols such as HTTP.
Requests are received by both types of load balancers and they are distributed to a particular server based on a configured algorithm. Some industry standard algorithms are:
- Round robin
- Weighted round robin
- Least connections
- Least response time
Web server
A web server is server software, or hardware dedicated to running this software, that can satisfy client requests on the World Wide Web. A web server can, in general, contain one or more websites. A web server processes incoming network requests over HTTP and several other related protocols.
The primary function of a web server is to store, process and deliver web pages to clients. The communication between client and server takes place using the Hypertext Transfer Protocol (HTTP). Pages delivered are most frequently HTML documents, which may include images, style sheets and scripts in addition to the text content
Application server
A web server accepts and fulfills requests from clients for static content (i.e., HTML pages, files, images, and videos) from a website. Web servers handle HTTP requests and responses only.
An application server exposes business logic to the clients, which generates dynamic content. It is a software framework that transforms data to provide the specialized functionality offered by a business, service, or application. Application servers enhance the interactive parts of a website that can appear differently depending on the context of the request.
The illustration below highlights the difference in their architecture:
The Database
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.
The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS software additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a “database system”. Often the term “database” is also used to loosely refer to any of the DBMS, the database system or an application associated with the database.
Database Servers
Database servers are used to store and manage databases that are stored on the server and to provide data access for authorized users. This type of server keeps the data in a central location that can be regularly backed up. It also allows users and applications to centrally access the data across the network. A large number of the databases used in your organization can be kept on one server or a group of servers that are specifically configured to protect data and service client requests.
Sources
https://www.cloudflare.com/learning/dns/what-is-dns/
https://www.websecurity.digicert.com/security-topics/what-is-ssl-tls-https