Fun with XMPP and Google Talk

29th October 2007

How often when you signed into your GTalk or Google Talk, have you wondered what might be happening behind the hood? Have you thought about what this little application might be cooking as you type those letters? How can it tell you that your friend is typing just as she has, in fact, started typing? How does it manage to show all that real time presence information?

Well, one day, I got irresistibly curious and decided to open it up! In this two part article, I share my thrilling adventures as I unravel the way GTalk does what it is best at — Communication.

The basics

First of all, let's remember that GTalk or any such communication application has to be just a socket program at the core. A socket program is a networking program which is usually targeted at a specific protocol. TCP/IP is the most widely used and supported communication protocol for the internet. Most of the protocols we come across in our mundane lives such as HTTP, FTP, SMTP etc, are all based on TCP/IP.

Identifying the protocol GTalk uses

Having established that, our next task is to find out which protocol GTalk actually uses. There are two ways to do this. The first one is to simply query Google itself for the answer. The second method is a little more complex and hence, thrilling. I shall stick on with the second one.

There are a number of network tracing and analysis tools available on the internet. These are powerful tools capable of revealing magical details about protocols, TCP/IP packets and so on. The one I have chosen is called Ethereal. It is a highly sophisticated tool that can analyze live TCP/IP packets. What's more, it even comes with a UI for doing all this and the best part is that it's an open source and free software.

After having set up Ethereal properly, I open the GTalk client and sign in while live capture of TCP/IP packets is in progress. Here's what I get.

Screenshot of Ethereal protocol analyzer live trace showing Jabber packets. Observe rows numbered 120 and 122.

It's a widely known fact, perhaps, that GTalk always connects to its server, When we ping, we get its IP address, which resolves to on my computer as of today. When we look back at the above trace, we find rows with "Destination" column having a value of (Observe rows numbered 120 and 124). If we observe the value of the "Source" column for these rows, we get This happens to be my computer's IP address! Thus we can conclude here that the rows we are looking at, are actually the TCP/IP packets sent from my computer to the GTalk server,!

Similarly, we can find rows such as 122, 128 and 131 which are packets which has sent to my computer as responses to the requests sent by my computer. Now that we've identified the important packets, it's a simple task to just read from the value of "Protocol" column and say that it is Jabber.

More about Jabber (and XMPP)

Quoting from the home of jabber,

Jabber is best known as "the Linux of instant messaging" — an open, secure, ad-free alternative to consumer IM services like AIM, ICQ, MSN, and Yahoo. Under the hood, Jabber is a set of streaming XML protocols and technologies that enable any two entities on the Internet to exchange messages, presence, and other structured information in close to real time.

Jabber defines a host of sub-protocols, XMPP being the core of them. XMPP stands for eXtensible Messaging and Presence Protocol. In October 2004, it was adopted by the IETF community and the specification is available as an RFC numbered 3920.

To cover XMPP specifics is out of scope of this article. In summary, XMPP is an XML based communications protocol. This means all requests/responses happen through XML. The GTalk client sends requests as XML messages to its server at and receives responses also as XML messages. In the next section, we shall see what forms the essence of XMPP while we try out some experiments.

Raw XMPP communication with

How about talking to the GTalk server using its native language? Well, this is not as farfetched as it sounds. But before we attempt to do anything like that, we should understand the nature of the XMPP protocol.

XMPP defines two fundamental terms with respect to messaging — Streams and Stanzas. Here's how we can define them:

Stream:  A Stream is an open XML envelope sent before exchanging more XML elements between two entities. These entities can be either the client or the server. These XML elements are known as stanzas as we learn in the next definition. Streams are always the root elements. They start with an optional XML Processing Instruction (Prolog) followed by an unterminated <stream:stream/> element. The Stream contains other information such as the server it is addressed to, the version of protocol used and various namespace declarations.

Stanza:  A Stanza is a specific, well formed and complete XML element which either of the entities sends within an already open XML Stream. Stanzas are always the first level children in the XML document. XMPP Core defines three types of Stanzas viz. <presence/>, <message/> and <iq/>.

Entities can send any number of these Stanzas within an open Stream. All other information is sent as nested elements or attributes of these core Stanzas. Further details of these Stanzas are again beyond the scope of this article.

Now let's see what happens in a simple session of the client with the server. The following shows typical interaction between the client and the server. For ease of readability, messages sent by the client have been annotated in blue colour whilst those sent by the server in red. This and many more such examples can be found in the specification of XMPP Core, RFC 3920.

<?xml version="1.0"?> <stream:stream to="" xmlns="jabber:client" xmlns:stream="" version="1.0">

<?xml version="1.0"?> <stream:stream from="" id="someid" xmlns="jabber:client" xmlns:stream="" version="1.0">

... encryption, authentication, and resource binding ...

<message from="" to="" xml:lang="en"> <body>Art thou not Romeo, and a Montague?</body> </message>

<message from="" to="" xml:lang="en"> <body>Neither, fair saint, if either thee dislike.</body> </message>



Typical interaction. Click here to show only messages from the client or the server. Click here to view both.

By looking at the patterns of messages, we can tell that there are two separate XML documents involved here. The one which the client opens and terminates in the end and other one which the server opens and closes. However, during an interaction, these XML documents are interspersed.

Now let's try some talking with XMPP will be our language and TCP/IP our medium.

To do a raw communication with any server in its native protocol, we need to be able to open a terminal session at a specific port on the server. We can use any of the available telnet clients such as Microsoft Telnet or Putty to do this. I chose Putty for historical reasons.

Let's first configure Putty to open a raw connection on at 5222 port. Note that 5222 is the non-SSL port which Jabber protocol uses. If we were to use 5223, which is the SSL enabled port, we would have difficulties doing our raw communication due to the encrypted nature of the medium.

Screenshot of Putty showing configuration to on port 5222

Here's another screenshot of the actual raw XMPP communication we've been talking about till now. The first and third XML fragments are sent by the client (us) while the second and fourth are sent by the server,

Screenshot of Putty showing raw XMPP interaction with Click here if the image above appears cropped.

We first initiate the stream with the "to" attribute of the Stream set to "". The server then acknowledges the request by sending another Stream enumerating the features it supports and the method of encryption that it mandates to be used. The <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls"><required/></starttls> element indicates that the server requires the client to acknowledge by sending another fragment, the <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls"/> element, indicating that it has accepted to start a TLS negotiation.

After this line, the server again acknowledges by telling the client to proceed with TLS negotiation. This is followed by an SASL negotiation. Here again, we reach our scope boundaries.

After the authentication phase, the client and the server can start exchanging XML Stanzas. We however, can't reach this stage using the raw communication approach with the GTalk server as TLS negotiation and SASL handshake both require understanding of complex encryption mechanisms and the messages exchanged would no longer be human readable.

What next?

So far, we talked about identifying the protocols applications use; discovered that GTalk is just another Jabber client. We learnt the basics of XMPP. We even successfully tried a preliminary raw XMPP communication with Next, we shall advance a step higher and see how we can exploit the wealth of features provided by XMPP to play with GTalk!

To know more, read Fun with XMPP and Google Talk, Part 2.



They say, The Early Bird Catches The Worm. Had it not been for friends who pointed out mistakes and helped me fix them soon after it was published, this article would still have been in a bad shape. My sincere thanks to my friends Amod Pandey, Bharati K and Hemanth H M.

What others say

Rekha 29th October 2007
would want to read the second part!!!
Publish soon

srinivas MD 3rd November 2007
Nice one dude

Adarsh R 3rd November 2007
Thanks Rekha and Srini. Yes I shall soon publish the second part.

krishna 5th November 2007
cm ! good one !

Sarika 20th November 2007
you rocking like always... nice dude :)

Akheel 20th November 2007
Good going buddy.. Keep that up!!!

@rpIT 29th May 2008
Hey really a gud one dude..

Jaspal Singh 15th April 2009
Putty experiment does not seems to work now.

Let me know what you think

Thank you. Please note that comments are moderated and will take some time to show up here.
Name (required)
Email (required but never published)
How much is two + three? (kill spam)
Your Comments (2000 characters max) Characters left: 2000