Internet Relay Chat (IRC) is a text-based computer-mediated communication (CMC) service in which people can meet and chat in real time. Most chat occurs in channels named for a specific topic, such as #usa or #linux. A user can take part in several channels when connected to an IRC network. For a long time the only major IRC network available was EFnet, founded in 1990. Over the 1990s three other major IRC networks developed, Undernet (1993), DALnet (1994) and IRCnet (which split from EFnet in June 1996). Several causes led to the separate development of IRC networks: fast growth of user numbers, poor scalability of the IRC protocol and content disagreements, like allowing or prohibiting 'bot programs. Today we are experiencing the development of regional IRC networks, such as BrasNet for Brazilian users, and increasing regionalisation of the global networks -- IRCnet users are generally European, EFnet users generally from the Americas and Australia. All persons connecting to an IRC network at one time create that IRC network's user space. People are constantly signing on and off each network. The total number of users who have ever been to a specific IRC network could be called its 'social space' and an IRC network's social space is by far larger than its user space at any one time.
Although there has been research on IRC almost from its beginning (it was developed in 1988, and the first research was made available in late 1991 (Reid)), resources on quantitative development are rare. To rectify this situation, a quantitative data logging 'bot program -- Socip -- was created and set to run on various IRC networks. Socip has been running for almost two years on several IRC networks, giving Internet researchers empirical data of the quantitative development of IRC.
Any approach to gathering quantitative data on IRC needs to fulfil the following tasks:
- Store the number of users that are on an IRC network at a given time, e.g. every five minutes;
- Store the number of channels; and,
- Store the number of servers.
It is possible to get this information using the '/lusers' command on an IRC-II client, entered by hand. This approach yields results as in Table 1.
Table 1: Number of IRC users on January 31st, 1995
During the first months of 1995, it was even possible to get all user information using the '/who **' command. However, on current major IRC networks with greater than 50000 users this method is denied by the IRC Server program, which terminates the connection because it is too slow to accept that amount of data. Added to this problem is the fact that collecting these data manually is an exhausting and repetitive task, better suited to automation. Three approaches to automation were attempted in the development process.
The 'Eggdrop' approach
The 'Eggdrop' 'bot is one of the best-known IRC 'bot programs. Once programmed, 'bots can act autonomously on an IRC network, and Eggdrop was considered particularly convenient because customised modules could be easily installed. However, testing showed that the Eggdrop 'bot was unsuitable for two reasons. The first was technical: for reasons undetermined, all Eggdrop modules created extensive CPU usage, making it impossible to run several Eggdrops simultaneously to research a number of IRC networks. The second reason had to do with the statistics to be obtained. The objective was to get a snapshot of current IRC users and IRC channel use every five minutes, written into an ASCII file. It was impossible to extend Eggdrop's possibilities in a way that it would periodically submit the '/lusers' command and write the received data into a file. For these reasons, and some security concerns, the Eggdrop approach was abandoned.
IrcII was a UNIX IRC client with its own scripting language, making it possible to write command files which periodically submit the '/lusers' command to any chosen IRC server and log the command's output. Four different scripts were used to monitor IRCnet, EFnet, DALnet and Undernet from January to October 1998. These scripts were named Socius_D, Socius_E, Socius_I and Socius_U (depending on the network). Every hour each script stored the number of users and channels in a logfile (examinable using another script written in the Perl language).
There were some drawbacks to the ircII script approach. While the need for a terminal to run on could be avoided using the 'screen' package -- making it possible to start ircII, run the scripts, detach, and log off again -- it was impossible to restart ircII and the scripts using an automatic task-scheduler. Thus periodic manual checks were required to find out if the scripts were still running and restart them if needed (e.g. if the server connection was lost). These checks showed that at least one script would not be running after 10 hours. Additional disadvantages were the lengthy log files and the necessity of providing a second program to extract the log file data and write it into a second file from which meaningful graphs could be created.
The failure of the Eggdrop and ircII scripting approaches lead to the solution still in use today.
Perl script-only approach
Perl is a powerful script language for handling file-oriented data when speed is not extremely important. Its version 5 flavour allows a lot of modules to use it for expansion, including the Net::IRC package. The object-oriented Perl interface enables Perl scripts to connect to an IRC server, and use the basic IRC commands.
The Socip.pl program includes all server definitions needed to create connections. Socip is currently monitoring ten major IRC networks, including DALnet, EFnet, IRCnet, the Microsoft Network, Talkcity, Undernet and Galaxynet. When run, "Social science IRC program" selects a nickname from its list corresponding to the network -- For EFnet, the first nickname used is Socip_E1. It then functions somewhat like a 'bot. Using that nickname, Socip tries to create an IRC connection to a server of the given network. If there is no failure, handlers are set up which take care of proper reactions to IRC server messages (such as Ping-pong, message output and reply). Socip then joins the channel #hose (the name has no special meaning), a maintenance channel with the additional effect of real persons meeting the 'bot and trying to interact with it every now and then. Those interactions are logged too. Sitting in that channel, the script sleeps periodically and checks if a certain time span has passed (the default is five minutes). After that, the '/lusers' command's output is stored in a data file for each IRC network and the IRC network's RRD (Round Robin database) file is updated. This database, which is organised chronologically, offers great detail for recent events and more condensed information for older events. User and channel information younger than 10 days is stored in five-minute detail. If older than two years, the same information is automatically averaged and stored in a per-day resolution.
In case of network problems, Socip acts as necessary. For example, it recognises a connection termination and tries to reconnect after pausing by using the next nickname on the list. This prevents nickname collision problems. If the IRC server does not respond to '/luser' commands three times in a row, the next server on the list is accessed. Special (crontab-invoked) scripts take care of restarting Socip when necessary, as in termination of script because of network problems, IRC operator kill or power failure. After a reboot all scripts are automatically restarted. All monitoring is done on a Linux machine (Pentium 120, 32 MB, Debian Linux 2.1) which is up all the time. Processor load is not extensive, and this machine also acts as the Sociology Department's WWW-Server.
Graphs can be created from the data in Socip's RRD files. This task is done using the MRTG (multi router traffic grapher) program by Tobias Oetiker. A script updates all IRC graphs four times a day. Usage of each IRC network is visualised through five graphs: Daily, Weekly and Monthly users and channels, accompanied by two graphs showing all known data users/channels and servers. All this information is continuously published on the World Wide Web at http://www.hinner.com/ircstat.
The following samples demonstrate what information can be produced by Socip. As already mentioned, graphs of all monitored networks are updated four times a day, with five graphs for each IRC network.
Figure 1 shows the rise of EFnet users from about 40000 in November 1998 to 65000 in July 2000. Sampled data is oscillating around an average amount, which is resulting from the different time zones of users.
Fig. 1: EFnet - Users and Channels since November 1998
Figure 2 illustrates the decrease of interconnected EFnet servers over the years. Each server is now handling more and more users. Reasons for taking IRC servers off the net are security concerns (attacks on the server by malicious persons), new payment schemes, maintenance and cost effort.
Fig. 2: EFnet - Servers since November 1998
A nice example of a heavily changing weekly graph is Figure 3, which shows peaks shortly before 6pm CEST and almost no users shortly after midnight.
Fig. 3: Galaxynet: Weekly Graph (July, 15th-22nd, 2000)
The daily graph portrays usage variations with even more detail. Figure 4 is taken from Undernet user and channel data. The vertical gap in the graph indicates missing data, caused either by a net split or other network problems.
Fig. 4: Undernet: Daily Graph: July, 22nd, 2000
The final example (Figure 5) shows a weekly graph of the Webchat (http://www.webchat.org) network. It can be seen that every day the user count varies from 5000 to nearly 20000, and that channel numbers fluctuate in concert accordingly from 2500 to 5000.
Fig. 5: Webchat: Monthly graph, Week 24-29, 2000
Not every IRC user is connected all the time to an IRC network. This figure may have increased lately with more and more flatrates and cheap Internet access offers, but in general most users will sign off the network after some time. This is why IRC is a very dynamic society, with its membership constantly in flux. Maximum user counts only give the highest number of members who were simultaneously online at some point, and one could only guess at the number of total users of the network -- that is, including those who are using that IRC service but are not signed on at that time. To answer these questions, more thorough investigation is necessary. Then inflows and outflows might be more readily estimated.
Table 2 shows the all time maximum user counts of seven IRC networks, compared to the average numbers of IRC users of the four major IRC networks during the third quarter 1998 (based on available data).
Table 2: Maximum user counts of selected IRC networks
|DALnet||EFnet||Galaxy Net||IRCnet||MS Chat||Undernet||Webchat|
|3rd Q. 1998||21000||37000||
Compared with the 200-300 users in 1991 and the 7000 IRC-chatters in 1994, the recent growth is certainly extraordinary: it adds up to a total of 306573 users across all monitored networks. It can be expected that the 500000 IRC user threshold will be passed some time during the year 2001.
As a final remark, it should be said that obviously Web-based chat systems will be more and more common in the future. These chat services do not use standard IRC protocols, and will be very hard to monitor. Given that these systems are already quite popular, the actual number of chat users in the world could have already passed the half million landmark.