Fun with XMPP and Google Talk
How often when you signed into your GTalk or Google Talk, have you wondered what might be happening behind the hood? Have you thought about what this little application might be cooking as you type those letters? How can it tell you that your friend is typing just as she has, in fact, started typing? How does it manage to show all that real time presence information?
Well, one day, I got irresistibly curious and decided to open it up! In this two part article, I share my thrilling adventures as I unravel the way GTalk does what it is best at — Communication.
First of all, let's remember that GTalk or any such communication application has to be just a socket program at the core. A socket program is a networking program which is usually targeted at a specific protocol. TCP/IP is the most widely used and supported communication protocol for the internet. Most of the protocols we come across in our mundane lives such as HTTP, FTP, SMTP etc, are all based on TCP/IP.
Identifying the protocol GTalk uses
Having established that, our next task is to find out which protocol GTalk actually uses. There are two ways to do this. The first one is to simply query Google itself for the answer. The second method is a little more complex and hence, thrilling. I shall stick on with the second one.
There are a number of network tracing and analysis tools available on the internet. These are powerful tools capable of revealing magical details about protocols, TCP/IP packets and so on. The one I have chosen is called Ethereal. It is a highly sophisticated tool that can analyze live TCP/IP packets. What's more, it even comes with a UI for doing all this and the best part is that it's an open source and free software.
After having set up Ethereal properly, I open the GTalk client and sign in while live capture of TCP/IP packets is in progress. Here's what I get.
Screenshot of Ethereal protocol analyzer live trace showing Jabber packets. Observe rows numbered 120 and 122.
It's a widely known fact, perhaps, that GTalk always connects to its server, talk.google.com. When we ping talk.google.com, we get its IP address, which resolves to 220.127.116.11 on my computer as of today. When we look back at the above trace, we find rows with "Destination" column having a value of 18.104.22.168 (Observe rows numbered 120 and 124). If we observe the value of the "Source" column for these rows, we get 192.168.200.190. This happens to be my computer's IP address! Thus we can conclude here that the rows we are looking at, are actually the TCP/IP packets sent from my computer to the GTalk server, talk.google.com!
Similarly, we can find rows such as 122, 128 and 131 which are packets which talk.google.com has sent to my computer as responses to the requests sent by my computer. Now that we've identified the important packets, it's a simple task to just read from the value of "Protocol" column and say that it is Jabber.
More about Jabber (and XMPP)
Quoting from the home of jabber, www.jabber.org
Jabber is best known as "the Linux of instant messaging" — an open, secure,
ad-free alternative to consumer IM services like AIM, ICQ, MSN, and Yahoo. Under the hood, Jabber is a set of
streaming XML protocols and technologies that enable any two entities on the Internet to exchange messages,
presence, and other structured information in close to real time.
Jabber defines a host of sub-protocols, XMPP being the core of them. XMPP stands for eXtensible Messaging and Presence Protocol. In October 2004, it was adopted by the IETF community and the specification is available as an RFC numbered 3920.
To cover XMPP specifics is out of scope of this article. In summary, XMPP is an XML based communications protocol. This means all requests/responses happen through XML. The GTalk client sends requests as XML messages to its server at talk.google.com and receives responses also as XML messages. In the next section, we shall see what forms the essence of XMPP while we try out some experiments.
Raw XMPP communication with talk.google.com
How about talking to the GTalk server using its native language? Well, this is not as farfetched as it sounds. But before we attempt to do anything like that, we should understand the nature of the XMPP protocol.
XMPP defines two fundamental terms with respect to messaging — Streams and Stanzas. Here's how we can define them:
Stream: A Stream is an open XML envelope sent before exchanging more XML elements between two entities. These entities can be either the client or the server. These XML elements are known as stanzas as we learn in the next definition. Streams are always the root elements. They start with an optional XML Processing Instruction (Prolog) followed by an unterminated <stream:stream/> element. The Stream contains other information such as the server it is addressed to, the version of protocol used and various namespace declarations.
Stanza: A Stanza is a specific, well formed and complete XML element which either of the entities sends within an already open XML Stream. Stanzas are always the first level children in the XML document. XMPP Core defines three types of Stanzas viz. <presence/>, <message/> and <iq/>.
Entities can send any number of these Stanzas within an open Stream. All other information is sent as nested elements or attributes of these core Stanzas. Further details of these Stanzas are again beyond the scope of this article.
Now let's see what happens in a simple session of the client with the server. The following shows typical interaction between the client and the server. For ease of readability, messages sent by the client have been annotated in blue colour whilst those sent by the server in red. This and many more such examples can be found in the specification of XMPP Core, RFC 3920.
<?xml version="1.0"?> <stream:stream to="example.com" xmlns="jabber:client" xmlns:stream="http://etherx.jabber.org/streams" version="1.0">
<?xml version="1.0"?> <stream:stream from="example.com" id="someid" xmlns="jabber:client" xmlns:stream="http://etherx.jabber.org/streams" version="1.0">
... encryption, authentication, and resource binding ...
<message from="email@example.com" to="firstname.lastname@example.org" xml:lang="en"> <body>Art thou not Romeo, and a Montague?</body> </message>
<message from="email@example.com" to="firstname.lastname@example.org" xml:lang="en"> <body>Neither, fair saint, if either thee dislike.</body> </message>
Typical interaction. Click here to show only messages from the client or the server. Click here to view both.
By looking at the patterns of messages, we can tell that there are two separate XML documents involved here. The one which the client opens and terminates in the end and other one which the server opens and closes. However, during an interaction, these XML documents are interspersed.
Now let's try some talking with talk.google.com. XMPP will be our language and TCP/IP our medium.
To do a raw communication with any server in its native protocol, we need to be able to open a terminal session at a specific port on the server. We can use any of the available telnet clients such as Microsoft Telnet or Putty to do this. I chose Putty for historical reasons.
Let's first configure Putty to open a raw connection on talk.google.com at 5222 port. Note that 5222 is the non-SSL port which Jabber protocol uses. If we were to use 5223, which is the SSL enabled port, we would have difficulties doing our raw communication due to the encrypted nature of the medium.
Screenshot of Putty showing configuration to talk.google.com on port 5222
Here's another screenshot of the actual raw XMPP communication we've been talking about till now. The first and third XML fragments are sent by the client (us) while the second and fourth are sent by the server, talk.google.com.
Screenshot of Putty showing raw XMPP interaction with talk.google.com. Click here if the image above appears cropped.
We first initiate the stream with the "to" attribute of the Stream set to "gmail.com". The server then acknowledges the request by sending another Stream enumerating the features it supports and the method of encryption that it mandates to be used. The <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls"><required/></starttls> element indicates that the server requires the client to acknowledge by sending another fragment, the <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls"/> element, indicating that it has accepted to start a TLS negotiation.
After this line, the server again acknowledges by telling the client to proceed with TLS negotiation. This is followed by an SASL negotiation. Here again, we reach our scope boundaries.
After the authentication phase, the client and the server can start exchanging XML Stanzas. We however, can't reach this stage using the raw communication approach with the GTalk server as TLS negotiation and SASL handshake both require understanding of complex encryption mechanisms and the messages exchanged would no longer be human readable.
So far, we talked about identifying the protocols applications use; discovered that GTalk is just another Jabber client. We learnt the basics of XMPP. We even successfully tried a preliminary raw XMPP communication with talk.google.com. Next, we shall advance a step higher and see how we can exploit the wealth of features provided by XMPP to play with GTalk!
To know more, read Fun with XMPP and Google Talk, Part 2.
Google Talk - The Instant Messaging and VOIP client by Google, Inc.
Ethereal - A network protocol analyzer for Unix and Windows.
Jabber - A not-for-profit organization that oversees the general development of XMPP and maintains the Jabber Enhancement Proposals.
XMPP - eXtensible Messaging and Presence Protocol.
XMPP RFCs - The base specifications of the eXtensible Messaging and Presence Protocol.
RFC 3920 - Specification of XMPP Core.
PuTTY - A free implementation of Telnet and SSH for Win32 and Unix platforms, alongwith an xterm terminal emulator.
TLS - Transport Layer Security (RFC 2246).
SASL - Simple Authentication and Security Layer (RFC 2222).
The Early Bird Catches The Worm. Had it not been for friends who pointed out mistakes and helped me
fix them soon after it was published, this article would still have been in a bad shape. My sincere thanks to my friends
Amod Pandey, Bharati K and Hemanth H M.
What others say
would want to read the second part!!!