// precondition: there exists a logfile with filename,
// which may contain a very large amount of lines, which are
// to be inserted in a MySQL database.
ArrayList<Long>stamps = new ArrayList <Long><String> ();
ArrayList<String> lines = new ArrayList ();
String update = "";
BufferedReader in = new BufferedReader(new FileReader(file));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null) {
if (line.trim().equals("")) continue; // empty linelong stamp; try { stamp = Long.parseLong(line.substring(0, line.indexOf(' '))); } catch (NumberFormatException e) { Log.error("Line from log file " + file.getName() + " does not contain time stamp. Line is: \"" + line + "\""); return ; } if (lineCount == 0) { update += "INSERT INTO `" + name + "` (stamp, line) values (?, ?)"; } else { update += ","+ Misc.EOL + "(?, ?)"; }
stamps.add(stamp); lines.add(line.substring(line.indexOf(' ')+1, line.length()));
// it continues by creating a prepared SQL statement and setting lineCount++;
}
in.close();
update += ";";
// the parameters using the stamps and lines ArrayLists...
Now, do you already see the issue? Don't worry if you don't - I've seen the same issue in various projects (including commercial projects) already, so you're not the only one who doesn't see it.
Let me give you a hint related to this specific code: One day the box hosting the Spring lobby server went down. So when it came back up I began to restart the lobby server software. First the lobby server itself. All was fine. Then ChanServ, the bot which contained the above code back then. I ran it's startup script, and waited..., and kept waiting...! The script didn't finish. At all.
Any clue yet?
No? Let me explain.
What - after some debugging - happened to be the case, was that this code in ChanServ was being invoked to import some rather large log files into the database. Now take a look at the code again: for every line in the log file, it appends a small string to update. This wouldn't be bad, if appending to a string in Java and .NET wasn't O(n), with n the length of the total string. In other words, the algorithm above as a whole was O(n2) instead of O(n) with respect to the number of lines in the file! (For example, this means that if importing 1000 lines takes 1 second, then importing 1 million lines takes, no, not half an hour, but a whopping 278 hour!)
It was trivial to rewrite the above code using StringBuilder. Afterwards, ChanServ started up and imported the log files into the database in a few minutes only.
StringBuilder update = new StringBuilder();
while ((line = in.readLine()) != null) {
// ...
if (lineCount == 0) {
update.append( "INSERT INTO `" + name + "` (stamp, line) values (?, ?)" );
} else {
update.append( ","+ Misc.EOL + "(?, ?)" );
}
// ...}
All in all, I want to urge you as C# or Java developer (or any other language with immutable strings) to keep in mind the algorithmic complexity of string concatenation. In simpler terms, I want you to use StringBuilder / StringBuffer whenever you are concatenating more then a few hundred characters into a string, to prevent nasty suprises like the one above.
This is interesting for Spring modders too: Lua has immutable strings. You can use table.concat instead of multiple concatenations, in your scripts.
ReplyDeleteYeah, I was wondering whether to include that explicitly or not. In the end I decided not to include it because the post had become fairly long already. Maybe something for another post. Thanks for reading btw :-)
ReplyDelete