QIP 2005 mojibake: the cause
Users of some ICQ clients (notably Pidgin) and Jabber-ICQ transports (JIT/pyICQt) may have experienced problems receiving messages from users of QIP 2005. Instead of the proper non-ASCII characters for their language, Cyrillic characters or question marks appeared. This research points out the cause and solution to these problems.
QIP is a closed-source Russian client for the AIM and ICQ network. It is not Unicode-aware, but it sends Unicode messages over the network by encoding the user input. This works with most recipients (using either official ICQ clients, Miranda IM, or another QIP), but in case of Pidgin, Jabber transports and other minor clients, the message contains Cyrillic characters even though the sender may not have used them. The message looks as if it were converted to Unicode from the Russian code page (CP1251) instead of the one set in QIP preferences or the one used by the OS.
This bug is known and has been admitted by the developers of QIP, but they refuse to fix it, saying that QIP 2005 is no longer being developed and that the new version to be released, QIP Infium, is free of this bug, which is true. However, Infium is still in beta stage and not freely available.
Still, the mystery is . This is also the reason why many people refuse to do anything about this behavior, saying “it works right for them”. Note that the sender always see their own message right, and messages received from one of the “cursed” clients also show up right for them.
A Wireshark analysis showed that , but when using QIP as the receiver, messages come encoded in (correct) UTF-8. This implies that .
To find out how exactly QIP determines which clients to send UTF-8 or UTF-16 to, I checked the client capabilities. Both Pidgin and QIP broadcast the “UTF-8 Messaging” cap, which has the CLSID of
0946134E-4C7F-11D1–8222444553540000
This ID is also appended to every packet containing a UTF-8 encoded message.
Further examination shows that most ICQ clients publish a – 11 for QIP, 9 for ICQ and SIM, 7 for Miranda. Pidgin does not send this information, and is shown with a DC protocol version of 0. (This information was once used to determine the client being used by the other party.)
So I set up an experiment: opened two instances of QIP 2005 with two different ICQ accounts, set the “DC version” of one of them to zero, and messaged it from the other one. Where upper characters had been, question marks appeared at the receiver. A network capture once again showed that the message was encoded in UTF-16BE and using Cyrillic characters, which the receiver was not able to show because it used CP1250 for its view. Setting the DC version to anything greater than zero using all characters of the Windows default codepage (in my case, CP1250).
Conclusion: In order to receive non-ASCII messages from this client properly, the receiver needs to a) show the “UTF-8 Messaging” capability, and b) publish its DC protocol version of greater than 0.
Although this is clearly a fault at QIP's side, it continues to effect, and the easiest way to get rid of the problems is to TLV along with the Location/User Info packet. It might not seem right to make fixes to others' bugs, but QIP 2005 is still in mass use, its developers are unlikely to fix the bug, and even with the arrival of QIP Infium, many people will refuse to upgrade. Most clients already send the DC Info, so adding this as a feature to Pidgin or a Jabber ICQ transport need not be thought of as a fix for another application's bug, but rather as an upgrade to the OSCAR protocol conformance. It should be the users that matter, and sending this problem away would surely add to their satisfaction.
IMPORTANT UPDATE: Clients which do not use “ICQ Server Relaying” (with the capability ID 09461349–4C7F-11D1–8222444553540000) do not receive proper UTF-8 from QIP 2005 even if they publish the DC info. This applies to Pidgin, among others.