Peer-Signaling Service

Introduction

As the world reacts to the Covid-19 global pandemic, many workers find themselves quarantined and forced to work remotely. Most companies employ third-party tools to support video chat, conferencing, and screen-sharing. While this is a necessary tactical approach, it may become necessary to provide tighter integration of these services within your applications.

In this article, we will introduce a service that supports a simple video-chat feature into the reference architecture. The Video-Chat allows an application client's user, to place a video-chat call to any other user currently logged into the application. To accomplish this, the client leverages the WebRTC support available in most HTML5 compliant browsers. WebRTC is a mature mechanism that provides real-time-communications to support peer-to-peer transport of voice, video, and data. Instead of requiring all video/voice/data to pass through your data-center or cloud provider, it flows directly between application clients. This approach significantly reduces application bandwidth, CPU, and memory resource consumption. This decrease in resource utilization translates to increased application capacity and a reduction in bandwidth and CPU costs. While the vast majority of the Video-Chat code resides in the browser client, it does require that we provide a peer-signaling service on the back-end to handle session registration, user discovery, and call-setup tasks.

Requirements

Before you get started, you will need the following:
  • Java
  • Maven
  • Docker
  • Docker-Compose
Refer to the Development Toolbox article if you do not have these installed locally.

WebRTC

This article will not be an in-depth tutorial on WebRTC. There already exists a wealth of books, tutorials, and websites devoted to the topic ( check the resource pages). However, for those unfamiliar with the technology, we will provide a brief introduction.

What is WebRTC

WebRTC is an open-source, real-time, video/audio/data peer-to-peer communications mechanism available in modern HTML5 compliant browsers and mobile devices. It provides a standards-compliant real-time communications mechanism without plugins or external applications.

Use Cases

With WebRTC, we are able to build various real-time-streaming applications, including:
  • User-to-User Video calls Two parties can transmit voice and video between their client applications.
  • Video-conference calls Multiple parties can transmit both voice and video between each other. Usually, one party acts as the conference facilitator and is responsible for creating a virtual conference session. Additional parties join and leave the active conference session while the facilitator is active. When a party joins the conference, it will establish a connection with each active participant.
  • Webcast As with video conference calls; a single party acts as a facilitator by creating a session. However, unlike the video conference model, only the facilitator broadcasts their voice and video to the other active parties.
  • Screen-Sharing In this use-case, a single user can share the contents of their desktop with between one and (n) other users.
  • Peer-To-Peer Data connections In addition to being able to broadcast audio and video data streams, WebRTC also allows us to create a data connection for application-specific peer-to-peer data transport (e.g., text chat, file transfer, etc.). The data connection can be used independently or in conjunction with audio/video streaming.
  • For this article, we will be implementing the User-To-User Video Call. We chose this approach for the following reasons:
    1. It demonstrates the core functionality required to transmit voice and video.
    2. It is (relatively) small enough to be easily comprehended.
    3. The resource demand on the user's browser is constrained. For each concurrent audio/video connection, the browser must handle both inbound and outbound stream processing as well as video rendering and audio playback. As the use-case complexity increases, there is a scalar increase in the client application's system resources required (both processing and bandwidth).
    4. Currently, screen-casting is not implemented across browsers in a consistent fashion.

    What we will build

    Peer-Signaling Service

    While WebRTC provides client-side real-time communication support, it doesn't provide a mechanism to handle session registration, peer discovery, and call setup. To address this deficiency we will build the Peer-Signaling Service. The service allows clients to register/de-register their availablity, and initiate and cancel peer connections. The service leverages Spring's support for websockets to provide a persistent connection between client and server. The service acts as both a connection registry identifying active users, and as a message router between users.



    Building the Peer-Signaling Service

    Peer-Signaling Service Hexagonal diagram

    The architecture of the PeerSignalingService differs from the previous services that we have built. Unlike our past services, it does not expose a REST endpoint. Instead, the service employs a WebSocket endpoint that provides a long-lived, full-duplex communication channel between the application client and the service. By using a web-socket, the service can both receive and originates messages. Allowing the service to originate messages obviates the need for the application client to poll the service for incoming messages.

    The source

    Maven pom.xml

    loading...

    Service Configuration

    bootstrap.yml

    loading...

    Not much different in this bootstrap file except for the spring.application.name

    appliction.yml

    loading...


    JWT Authorization

    Since the PeerSignalingService is built using WebSockets, we must modify our approach to JWT authorization slightly. Unlike our REST services, WebSockets do not support passing arbitrary headers to our endpoint. This complication prevents an application client from simply passing the JWT authentication token in the Authorization header. Fortunately, the IETF Websocket Standard provides us with the Sec-WebSocket-Protocol header, allowing the client to pass an array of subprotocols during the connection handshake. Since the PeerSignaling Service does not require a subprotocol handshake, we are free to use this header to transport our JWT authentication token. We can accomplish this with a few minor adjustments to the service's JWTAuthoritationFilter.

    loading...

    Here we extract the sec-websocket-protocol header value from the request and verify that it is not null, contains two values. We then parse the second value and convert it into a JWT instance that the service can authorize.

    Peer Socket Handler

    loading...

    The PeerSocketHandler provides the service's application client endpoint. It extends Spring's TextWebSocketHandler class to provide the transmission and receiving channels. Unlike our earlier REST services, each client application maintains a connection to the PeerSocketHandler for the duration of the session. Rather than calling discrete endpoint methods, the client transmits messages to the service across the WebSocket connection, and the PeerSocketHandler processes these messages. Messages are either processed by the service, or routed to another application client.

    When a message arrives at the PeerSocketHandler, it is represented as a text string. The service uses a simple message framing in the form:

    Position Content Description
    0 Event Type The event type identifier (e.g., register-user, candidate, etc. ).
    1 Destination Session Id Routed messages include the destination session id to allow the service to lookup the destination session.
    2 JSON message The message payload.


    The text string is split on the pipe character into an array containing the event-type in the zero element, the destination session id in the first element, and the JSON message in the second element. A switch statement evaluates the event-type and invokes the corresponding handler method. Each handler method converts the JSON into the appropriate class instance it must use for processing the message. The service supports the following message event types:

    • register-user-Registers a web socket connection to a particular user.
    • de-register-user-De-registers a web socket connection.
    • get-active-users-returns a list of currently active users.
    • initiate-call-routes a call request to the desired end-user websocket session.
    • call-request-notifies the desired end-user that a call request is being made.
    • call-request-response-provides a reply to the call initiator indicating acceptance or rejection of the call request.
    • accept-callnotifies the caller that the call request has been accepted.
    • offer-routes WebRTC offer messages to the designated callee.
    • answer-routes WebRTC answer messages to the caller.
    • candidate-routes WebRTC candidate messages to the designated callee.
    • call-request-canceled-routes a call cancelation message to the the designated caller prior to call setup. This allows the caller to cancel a call prior to the callee accepting the call.
    • remote-hangup-routes a message to the other calling party indicating that the call has been terminated.


    Peer Signaling models

    Peer User

    loading...

    The PeerUser class represents an instance of the a unique client application user.

    Session Routing Info

    loading...

    The SessionRoutingInfo class provides the information needed by the service to route the message to the destination session.

    Signaling Data Service

    loading...

    The SignalingDataService defines the interface for managing peer-signaling data. It provides methods for:
    • registering & deregistering peer users.
    • adding & removing signaling sessions.
    • PeerUserlookup.
    • WebSocketSessionlookup
    • list the active PeerUser instances.
    • list the active WebSocketSession instances.


    Single Node Signaling Data Service Impl

    loading...

    The SingleNodeSignalingDataServiceImpl provides an implementation of the SignalingDataService. We use this implementation when deploying a single instance of the service.

    Note: It is important to remember that each web socket connection consumes resources and the number of connections that an instance of the Peer-Signaling service is limited by the machine the service is deployed on. When deploying large-scale Peer-Signaling service support, we will deploy multiple instances of the PeerSignaling Service configured with a MultiNodeSignalingDataServiceImpl. In this implementation, the PeerUser and WebSocketSession data is maintained in a distributed data grid. By distributing the data across the grid, any instance of the PeerSignaling Service can route messages to every other instance.

    Docker Compose

    We continue the process of extending our Docker-Compose file. Here we include our FeatureService

    loading...
    To run the application with the Peer-Signaling Service can copy the Docker-Compose file above to your local machine and run it from the command prompt:

     docker-compose -f ./dc-13-signaling-service.yml up -d

    Peer-Signaling Service In Action

    Unlike our previous services, the PeerSignaling Service uses a WebSocket session to communicate with the service. Instead of separate endpoint methods, we connect to the service once and transmit & receive messages across that connection.

    WebSocket Connection

    Connection sequence diagram
    When an application client's WebSocket connects to the PeerSocketHandler, the afterConnectionEstablished method is invoked. The PeerSocketHandler then calls the SignalingDataService implementation to add the new session.

    WebSocket Disconnection

    Disconnection sequence diagram

    After the application client's WebSocket disconnects (either on logout, application close, or network error), the PeerSocketHandler invokes its afterConnectionClosed method. This method first calls the SignalingDataService implementation's removePeerUserSession method to remove the session and PeerUser references. The PeerSocketHandler then calls its broadcast method with the list of active peer users to notify the remaining application clients of the disconnect.

    Registration

    Registration sequence diagram
    Registration is the process of associating a user account with an active WebSocket session. Registration is initiated when the application client sends a register-user message to the PeerSocketHandler. The PeerSocketHandler invokes its registerUser method, which in turn calls the SignalingDataService's findPeerUserById method to check if the user has already registered. Currently, the service does not support registration of multiple user sessions. This approach prevents message routing issues arising from message destination session ambiguity. If the user is already registered, the PeerSocketHandler will send a duplicate-registration message to the application attempting to register. Otherwise, the PeerSocketHandler will invoke the SignalingDataService's registerPeerUser method, which will associate the current WebSocket session with the registered user. The PeerSocketHandler then invokes its broadcast method to notify all active sessions of the updated activeUserList.

    De-Registration

    De-Registration sequence diagram

    De-Registration is the process of removing a user's active WebSocket session from the service. When called, the PeerSocketHandler calls the SignalingDataService's deRegisterUser method to remove the associated user and session. When the user and session have been removed, the PeerSocketHandler calls its broadcast method to notify the remaining active sessions of the updated active session list.

    Message Routing

    In addition to connection and session registration, the PeerSignalingService also provides a simple message routing mechanism. Routing messages include a DestinationSessionId, which the service uses to find the destination session. Once the destination session instance has been obtained, the original message is then sent to that session.

    Initiate-Call message routing

    The initiate-call message is representative of the service's routing mechanism. In the sequence below, we see two applications clients register with the services. Application Client 2 initiates a call by sending the initiate-call message to the service.

    Initiate-Call Message Routing sequence diagram

    The PeerSocketHandler processes the initiate-call message by first looking up the supplied destinationSessionId through the SignalingDataService's findWebSocketSessionById method. Once the session has been obtained, the service calls the session's sendMessage method with the contents of the original message. Application Client 1 will then receive the message on its WebSocket's inbound channel.

    Metrics and Monitoring

    The Peer-Signaling Service generates the following service-level metrics:
    • peer.signaling.connection.total
    • peer.signaling.disconnection.total
    • peer.signaling.peer.register.user.total
    • peer.signaling.peer.deregister.user.total
    • peer.signaling.peer.accept.call.total
    • peer.signaling.peer.answer.total
    • peer.signaling.peer.call.request.canceled.total
    • peer.signaling.peer.call.request.response.total
    • peer.signaling.peer.call.request.total
    • peer.signaling.peer.candidate.total
    • peer.signaling.peer.offer.total
    • peer.signaling.peer.remote.hangup.total

    To visualize this data, we will import the Peer-SignalingDashboard.json file from the ThinkMicroservices Github Dashboards repository. The dashboard should appear as:

    Grafana Peer-Signaling Dashboard

    Resources



    Coming Up

    The Peer-Signaling Service is the last core service of our reference implementation. These services can be used as provided or modified to suit your use cases. In our next article, we will shift our perspective from the service-side to the client-side by looking at an example Angular Web Client designed to work with the services we have just described.