The what, the why, and the how of WebSockets

Web Sockets are an integral part of a lot of modern-day systems. But have you ever wondered how a Web Socket actually works? Or does your system even need a Web Socket? In this post, we are going to cover the basics of Web Sockets, i.e, the what, the why, and the how.

What is a Web Socket?

It is a way to have bi-directional, persistent, and real-time communication between a client and a server. The modern-day browsers started supporting Web Sockets in 2010 and today almost all the browsers have native support for Web Sockets.

If we look at how things work in HTTP protocol, we see that the regular request-response mechanism is uni-directional i.e the client sends a request to the server, the server responds back with the data and then the connection is closed.

With Web Sockets, the client can send the request to the server, but the server can also send the data to the client without the client initiating a request for it, keeping the connection between them open. This behavior makes it bi-directional.

About Web Socket

Imagine building an application where your user can chat with each other and it is built over HTTP protocol. Every time a user wants to see if it has received any messages, it has to refresh the page or somehow trigger to send a request to the server. This isn’t a real-time experience for the user.

On the other hand, if this part of the application is built over Web Sockets, the user would never have to send a request to the server to fetch messages and the server can send a message to the client (the user’s browser here) with the latest message, making the experience become truly real-time.

What did they use before Web Sockets?

Before Web Sockets, there were a few ways to solve problems.

One of the ways was to keep sending AJAX requests every few milliseconds or seconds depending on the nature of the feature you are building. This mechanism is called Short Polling. The problem with this method is there will always be a certain delay in receiving the message. It is resource-intensive as requests will constantly be sent to the server to fetch the latest data.

Another way was long pooling. Long Pooling is a technique that tries to emulate the real-time communication between the client and the server. The client sends a request to the server and the server keeps the channel open until it has the data to send back to the client. As soon as the client receives back the data, it immediately sends another request to the server to open another connection between the client and the server.

There is another solution called Server Send Events which comes very close to solving the problems Web Sockets is solving. Server-Sent Events are basically a way HTTP provides to send data from the server to the client just like Web Sockets. The only downside is that it is a one-way communication, i.e, the client cannot send data to the server via the Server-Sent Events. Due to this limitation, it makes this an excellent solution for very specific kinds of use-cases, but is not a replacement for Web Sockets.

While there are ways to get around without using the Web Sockets, there are certain problems that can only be solved by Web Sockets. Let’s look at the kind of problems we should look to solve via Web Sockets.

When to use Web Socket over HTTP?

Because HTTP-based solutions are stateless and uni-directional, the server sends something to the client only when the client requests it. If you are building an app where you need to show the user profile when the user visits the profile page, it totally makes sense to keep things simple using the HTTP request-response.

Because Web Sockets are stateful and bi-directional, the server can also send data to the client without the client explicitly requesting it. When you need to build an application where push-based, real-time communication is needed, Web Sockets should be used. Think about getting the latest updates on social media without refreshing a page, collaborative editing on a document, or even a chat room- all of these use cases need real-time and bi-directional communication.

How do Web Sockets work?

Web Sockets connection between the client and the server is persistent in nature. This means that either the client or the server has to close the connection, otherwise it remains an intact TCP connection. Whereas the HTTP will close the TCP connection as soon the response is received from the server.

How does this connection happen? The connection starts with a handshake between the client and the server. The client sends an HTTP request to the server with a few additional headers which informs the server that this is a request for a Web Socket handshake. The headers are Connection: Upgrade , Upgrade: websocket and Sec-WebSocket-Key: Base64 encoded random value.

Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: h44jdY83u63bgsRwsjUSHw==

The server upon receiving this request finds that Connection: Upgrade is being passed and checks the value of the Upgrade header and sends back these headers in the response.

HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Accept: h32jsdlkjUUb8484Jsksu=

The other thing to note here is that the client and server should agree on a format/type of the data they both would communicate in. The client already knows what all format it understands, so it sends the possible options to the server and then the server has to choose one from the list otherwise risks dropping the connection.

The header passed from the client is called Sec-Websocket-Protocol with possible options like this: Sec-Websocket-Protocol: chat,wamp and the server responds back with one of the options from the list in the same header.

About Web Socket Handshake

How to implement Web Sockets?

There are various open-source libraries in most languages to support the implementation of Web Sockets. A quick Google search will give you options for Python, Go, Java, Javascript, and Ruby.

Specifically for Ruby, there are a few open-source libraries that we could use to integrate WebSockets in Rails. But ever since Rails 5 came out and specifically ActionCable opens a new window with it, it has made integrating WebSockets quite easy and very much like coding the Rails way.

The other open-source options are:

https://www.github.com/ruby-jp/websocket-client-simple opens a new window https://www.github.com/igrigorik/em-websocket opens a new window

I will write a separate article on how to use ActionCable to solve problems via WebSockets.

I hope you enjoyed reading this and got some insights about Web Sockets. To further read about the topic, here are some of the articles that were referred to while writing this and you can find more about WebSockets in these blogs.

https://www.twilio.com/docs/glossary/what-are-websockets opens a new window https://www.wallarm.com/what/a-simple-explanation-of-what-a-websocket-is opens a new window https://ably.com/topic/websockets opens a new window