Sunday, 20 March 2011

Ouch that hurts...hello IBM quality control are you out there?

Further analysing the code in lsxsd.lss I'm utterly disgusted about the code I find. I'm sorry to use such strong words, but it reflects how bad it is. I have looked at the NotesStream to Base64 encoding (XSD_DATATYPE_CONVERTER.notesStreamToBase64) in particular. For my file download tests a used a 3.2MB image file and the method got the server to its knees. Here are the reasons for it:

  1. The  method  returns a string and hence uses string concatenation to build the return value. The problem with that is that as longer a string gets, the longer a string concatenation takes. Now this is not a linear increase,  but an exponential one. To a point that any string concatenation with a 2MB string takes pretty much forever in server code execution terms.
  2. The method reads in the Notes stream byte by byte...OUCH!!! This is probably the best way to render the NotesStream class useless.
  3. Though only handling Strings, the concatenation is done not using the + (String) operator, but using the much slower & (Variant) operation.  
Here is the original code (I have highlighted the most significant 2 lines of code):

Function notesStreamToBase64 (ns As NotesStream) As String
Const b64Chars$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
Dim nsLength As Long
nsLength = ns.Bytes
ns.Position = 0
Dim numPads As Integer
numPads = (3 - (nsLength Mod 3)) Mod 3
While nsLength > 0
' Output lines are limited to 76 chars, and because every
' three input bytes produce 4 output chars we process
' up to 57 input bytes at a time. (57 = 76 / 4 * 3)
Dim inLength As Long
inLength = nsLength
If inLength > 57 Then inLength = 57
nsLength = nsLength - inLength
Dim outString As String
outString = ""
Dim idx As Integer
idx = 0
While idx < inLength
' Collect up to 24 bits (3 bytes) of input data
Dim outBits As Integer
outBits = 0
Dim bits24 As Long
bits24 = 0
Dim i As Integer
For i = 0 To 2
bits24 = bits24 * 256
If idx + i < inLength Then
Dim buf As Variant
buf = ns.Read(1&)
bits24 = bits24 + buf(0)
outBits = outBits + 8
End If
Next
idx = idx + 3
Dim numChars As Integer
numChars = 4
If outBits <> 24 Then
numChars = 4 - numPads
End If
For i = 1 To numChars
Dim bits6 As Integer
bits6 = (bits24 And &HFC0000&) / 262144
outString = outString & Mid$(b64Chars, bits6 + 1, 1)
bits24 = (bits24 And &H3FFFF&) * 64
Next
If numChars <> 4 Then
For i = 1 To numPads
outString = outString & "="
Next
End If
Wend
' Add another line of base64 output to the return string
notesStreamToBase64 = notesStreamToBase64 & outString & Chr$(13) & Chr$(10)
Wend
End Function

And this is how it could (should) look like:

Function notesStreamToBase64 (ns As NotesStream) As NotesStream
On Error GoTo errhandle
Print Now
Const b64Chars$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
Dim batchsize As Long
Dim inLength  As Long
Dim outString As String
Dim idx       As Long
Dim outBits   As Long
Dim bits24    As Long
Dim i         As Long
Dim nsLength  As Long
Dim numPads   As Long
Dim buf       As Variant
Dim numChars  As Long
Dim bits6     As Long
Dim bufcnt    As Long
Set notesStreamToBase64 = session.Createstream()
batchsize   = 57
bufcnt      = 32766
nsLength    = ns.Bytes
inLength    = batchsize
ns.Position = 0
numPads = (3 - (nsLength Mod 3)) Mod 3
While nsLength > 0
' Output lines are limited to 76 chars, and because every
' three input bytes produce 4 output chars we process
' up to 57 input bytes at a time. (57 = 76 / 4 * 3)

If nsLength < batchsize Then 
inLength = nsLength
End if

nsLength  = nsLength - inLength
outString = ""
idx       = 0
While idx < inLength
' Collect up to 24 bits (3 bytes) of input data
outBits = 0
bits24  = 0

For i = 0 To 2
bits24 = bits24 * 256

If idx + i < inLength Then
If bufcnt = 32766 Then
bufcnt = -1
buf    = ns.Read(32767)
End If

bufcnt  = bufcnt  + 1
bits24  = bits24  + buf(bufcnt)
outBits = outBits + 8
End If
Next
idx      = idx + 3
numChars = 4
If outBits <> 24 Then
numChars = 4 - numPads
End If
For i = 1 To numChars
bits6     = (bits24 And &HFC0000&) / 262144
outString = outString + Mid$(b64Chars, bits6 + 1, 1)
bits24    = (bits24 And &H3FFFF&) * 64
Next

If numChars <> 4 Then
For i = 1 To numPads
outString = outString + "="
Next
End If
Wend
' Add another line of base64 output to the return string
retbyt = notesStreamToBase64.Writetext(outString, EOL_CRLF)
Wend

errexit:
Print Now

Exit Function
errhandle:
Print Erl & ", " & Error
Sleep 1
Resume next

End Function


This is maybe not the best way of implementing a Base64 encoding algorithm. I'm not an expert in Base64 encoding in the first place. I simply made the original code work (more efficiently). There is still room for improvement, but at current stage I managed to get the processing time from "forever" (I restarted the server after 10 minutes of processing the 3MB file) down to 24 seconds. Still a bit slow, but reasonable.

Coming back to the title of this posting, I'm not amused that such code leaves IBM quality control unchecked. If this is an example of code that can be found in the Lotus Notes Domino code stream, then it is no wonder why the Notes Clients became the slow and heavy tankers they are today.

I hope someone at IBM looks at this postings.


12 comments:

  1. Making the Base64 encoding method returning a NotesStream requires obviously some more changes within LSXSD.LSS. I will soon post a complete modified LSXSD.LSS for download. This code can be placed in a script library and then used with the USE rather then with the INCLUDE statement.

    ReplyDelete
  2. I think I read a post by IBM:er André Girard that he did an improvement in the base64 code to speed up the base64 conversion- But I could be wrong.
    Send a comment to him. Think he's the guy to ping.

    ReplyDelete
  3. Fredrik, you are referring to the bookstore database on openntf.org. however, there is a problem with the encoding method too. In our environment it keeps the server cpu at 100% for a long period for files larger than a few KB. We have tested but only the script version. The reason for the anomaly could be that the LotusScript version includes Java libs using the Java bridge lib of Lotus Notes. But I'm not certain. I have not looked into the code yet, but will probably do these coming days. In any case, I solved the problem.

    ReplyDelete
  4. No I mean this post
    http://www-10.lotus.com/ldd/bpmpblog.nsf/dx/dxl-importing-issue-may-want-hotfix

    ReplyDelete
  5. OK, thanks for the link. Not sure if this is related to the other issue reported by IBM for which I posted a link earlier...
    https://www-304.ibm.com/support/docview.wss?dc=DB550&rs=463&uid=swg1LO57765&context=SSKTWP&cs=utf-8&lang=en&loc=en_US
    I do not believe it relates to lsxsd.lss, because the issues in there are not related to DXL import, nor do the problems start with 3MB, but basically for any file size larger than a few KB. The reason being that it is a fundamental flaw in the way the classes are implemented in lsxsd.lss.

    ReplyDelete
  6. In Notes 8.5 they added a notesStreamToBase64Ext method that uses an internal NotesStream method to handle the conversion instead of doing it in pure LotusScript. I think this is probably much faster.

    ReplyDelete
  7. Julian yes, there is indeed an internal NotesStream method, which unfortunately doesn't work...see my previous posting:

    http://flexdomino.blogspot.com/2011/03/aaargh-yet-another-set-back-in-our.html

    ReplyDelete
  8. There's a nice tight java library out there that I use via LS2J that is an order of magnitude faster than using LS to do base 64. Google Base64 and Mikael Grev.

    Using LS to do a Base64 of a modest string of about 400 characters was taking about 1.2 seconds. The java method comes in well under 100 ms. Incidentally, converting a bunch of other libraries that did string manipulation from LS to Java and then using them via LS2J gave similar performance improvements. They 'say' Java is as fast as LS in an agent, but I think there must be a specific subset of valid cases for that statement.

    ReplyDelete
  9. Jerry, with my current modifications to lsxsd.lss I manage to convert the 3,267,584 Bytes (or 3191 KB) of my test file in 20 seconds. If my math are correct it means that the improved LotusScript (!) algorithm converts 1KB data in less than 0.0063 seconds or 6.3 milliseconds. Good enough for me at this stage.

    ReplyDelete
  10. ...And just so that these statistics get a frame of reference, the test was executed on a HP Elitebook 2730p, 1.8Mhz dual core, 8GB RAM, Windows 7 Ultimate 64bit running Domino for Windows 64bit. Whilst being a fast tablet, its nowhere near a real server, which surely will perform much faster.

    ReplyDelete
  11. Whilst working on improving the decoding method, I couldn't refrain from letting the XSD_DATATYPE_CONVERT.base64ToNotesStream method decode my 3MB test file. The server console recorded a start time of 19:44:38 and a finishing time of 23:04:48. That is a staggering 3 hours 20 minutes or approx. 1 hour per megabyte.

    ReplyDelete