One particular thing has always surprised me in languages like Java and C#. Consider the following piece of code, taken from
this file in ChanServ (a bot in the Spring lobby server for moderation/administration purposes.)
// precondition: there exists a logfile with filename ,
// which may contain a very large amount of lines, which are
// to be inserted in a MySQL database.
ArrayList<Long> stamps = new ArrayList<Long>();
ArrayList<String> lines = new ArrayList<String>();
String update = "";
BufferedReader in = new BufferedReader(new FileReader(file));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null) {
if (line.trim().equals("")) continue; // empty line
long stamp;
try {
stamp = Long.parseLong(line.substring(0, line.indexOf(' ')));
} catch (NumberFormatException e) {
Log.error("Line from log file " + file.getName() + " does not contain time stamp. Line is: \"" + line + "\"");
return ;
}
if (lineCount == 0) {
update += "INSERT INTO `" + name + "` (stamp, line) values (?, ?)";
} else {
update += ","+ Misc.EOL + "(?, ?)";
}
stamps.add(stamp);
lines.add(line.substring(line.indexOf(' ')+1, line.length()));
lineCount++;
}
in.close();
update += ";";
// it continues by creating a prepared SQL statement and setting
// the parameters using the stamps and lines ArrayLists...
Now, do you already see the issue? Don't worry if you don't - I've seen the same issue in various projects (including commercial projects) already, so you're not the only one who doesn't see it.
Let me give you a hint related to this specific code: One day the box hosting the Spring lobby server went down. So when it came back up I began to restart the lobby server software. First the lobby server itself. All was fine. Then ChanServ, the bot which contained the above code back then. I ran it's startup script, and waited..., and kept waiting...! The script didn't finish. At all.
Any clue yet?
No? Let me explain.
What - after some debugging - happened to be the case, was that this code in ChanServ was being invoked to import some rather large log files into the database. Now take a look at the code again: for every line in the log file, it appends a small string to
update. This wouldn't be bad, if appending to a string in Java and .NET wasn't
O(n), with n the length of the
total string. In other words, the algorithm above as a whole was O(n
2) instead of O(n) with respect to the number of lines in the file! (For example, this means that if importing 1000 lines takes 1 second, then importing 1 million lines takes, no, not half an hour, but a whopping 278 hour!)
It was trivial to rewrite the above code using
StringBuilder. Afterwards, ChanServ started up and imported the log files into the database in a few minutes only.
StringBuilder update = new StringBuilder();
while ((line = in.readLine()) != null) {
// ...
if (lineCount == 0) {
update.append( "INSERT INTO `" + name + "` (stamp, line) values (?, ?)" );
} else {
update.append( ","+ Misc.EOL + "(?, ?)" );
}
// ...
}
All in all, I want to urge you as C# or Java developer (or any other language with immutable strings) to keep in mind the algorithmic complexity of string concatenation. In simpler terms,
I want you to use StringBuilder / StringBuffer whenever you are concatenating more then a few hundred characters into a string, to prevent nasty suprises like the one above.