Hello,
Here is the benchmarking and verification program.
See comments for results and details and such.
This is not the end of the contest.
I might make more versions and benchmark them as well.
You are free to make more versions too.
Let me know and I include them as well.
// *** Begin of Program ***
program BenchmarkWriteLongwordBits;
{$APPTYPE CONSOLE}
{
Benchmark and Verification program for WriteLongwordBits routines/contest.
Version 0.01 and Version 0.02 created on 7 may 2008 by Skybuck Flying.
Skybuck Flying enters the contest with 4 routines.
Terje enters the contest with 1 routine.
Benchmark Performed on AMD X2 3800+ at full speed.
Optimizations on.
Range checking off.
Overflow checking off.
Internet Connection disabled.
Settings:
BufferSize := 640*480*3*70;
Results:
program started
Building verification bit string... done.
Benchmarking: 1 Name: WriteLongwordBitsA1B1v4 Verifieing... done.
Benchmarking: 2 Name: WriteLongwordSlowInt64 Verifieing... done.
Benchmarking: 3 Name: WriteLongwordBitsSpecialAsmV1 Verifieing... done.
Benchmarking: 4 Name: WriteLongwordBitsSpecialAsmV2 Verifieing... done.
Benchmarking: 5 Name: WriteLongwordBitsTerje2Fixed Verifieing... done.
Benchmark Results:
-------------------------------------------------------------------------------
Number: 1
Name: WriteLongwordBitsA1B1v4
Author: Skybuck Flying
Seconds: 1.04
Verified: TRUE
Number: 2
Name: WriteLongwordSlowInt64
Author: Skybuck Flying
Seconds: 1.42
Verified: TRUE
Number: 3
Name: WriteLongwordBitsSpecialAsmV1
Author: Skybuck Flying
Seconds: 1.64
Verified: TRUE
Number: 4
Name: WriteLongwordBitsSpecialAsmV2
Author: Skybuck Flying
Seconds: 1.42
Verified: TRUE
Number: 5
Name: WriteLongwordBitsTerje2Fixed
Author: Terje
Seconds: 1.14
Verified: TRUE
program finished
Comments by Skybuck:
Terje's version is pretty fast despite the fact that it uses a branch. His
version uses nice 32 bit instructions and no sub routine calls. Maybe his
memory alignment trick has something to do with it as well.
However my A1B1 version is still the fastest :)
I should try to make a memory aligned version as well to see if that helps
performance.
}
uses
SysUtils,
Windows;
// helper routines
function BitString( Para : pointer; Bits : longword ) : string;
var
vByteAddress : longword;
vBitAddress : longword;
vIndex : longword;
begin
pointer(result) := nil; // clear it just in case.
SetLength( result, Bits );
vIndex := 0;
while vIndex < Bits do
begin
vByteAddress := vIndex shr 3; // div 8
vBitAddress := vIndex and 7; // mod 8
byte(result[vIndex+1]) := ord('0');
if (Plongword(longword(Para) + vByteAddress)^ shr vBitAddress) and 1 = 1
then
begin
byte(result[vIndex+1]) := ord('1');
end;
vIndex := vIndex + 1;
end;
end;
function KeepLowBits( Value : longword; Bits : longword ) : longword;
inline;
begin
Result := Value; // 32 bits case.
if Bits <= 31 then
begin
Result := Result and not (4294967295 shl Bits); // shl instruction
limited
to 31.
end;
end;
function KeepHighBits( Value : longword; Bits : longword ) : longword;
inline;
begin
Result := Value; // 32 bits case.
if Bits <= 31 then // 0 to 31 bits case.
begin
Result := Result and not (4294967295 shr Bits); // shr instruction
limited
to 31.
end;
end;
function ****ftLeft( Left : longword; Right : Longword; ****ft : longword )
:
longword;
asm
shld eax, edx, cl
end;
function ****ftRight( Right : longword; Left : longword; ****ft : longword )
:
longword;
asm
shrd eax, edx, cl
end;
// prototype one as described by the specification
type
Troutine = procedure ( Value : longword; Bits : longword; DestAddress :
pointer; DestBitIndex : longword );
// Skybuck's Lightning Fast WriteLongwordBits A1B1
// 19 instructions when not inlined.
// 14 instructions when inlined.
// better algorithm and code, doesn't have the double garbage problem of
previous version :)
// very fast version I like it ! ;) :)
// there must be no garbage bits in the value... otherwise the next time
those garbage bits will
// be or-ed with the next data value ;) :)
procedure WriteLongwordBitsA1B1v4( Value : longword; Bits : longword;
DestAddress : pointer; DestBitIndex : longword );
begin
// calculate first destination byte
longword(DestAddress) := longword(DestAddress) + (DestBitIndex shr 3); //
div 8
// calculate ****ft value, DestBitIndex will now function as the ****ft
value.
DestBitIndex := DestBitIndex and 7; // mod 8
// or destination with byte 0 to 3
Plongword(DestAddress)^ := Plongword(DestAddress)^ or (Value shl
DestBitIndex);
// incrementation destination with 1 byte.
longword(DestAddress) := longword(DestAddress) + 1;
// or (this next) destination with byte 1 to 4
Plongword(DestAddress)^ := Plongword(DestAddress)^ or ( (Value shr 8) shl
DestBitIndex);
end;
// Skybuck's slow ass routine but it does do A2B2
// 89 instructions with optimizations on !!! holycow. it's the int64
stuff.
procedure WriteLongwordSlowInt64( Value : longword; BitCount : longword;
ToBase : pointer; ToBitPointer : longword );
var
Content : int64;
Mask : int64;
ByteIndex : longword;
BitIndex : byte;
begin
// copy value to a longword
Content := Value;
// calculate mask
Mask := not (18446744073709551615 shl BitCount);
// cut of accessive bits from content
Content := Content and Mask;
// ok now determine where the longword must be written to. which byte
ByteIndex := ToBitPointer shr 3; // div 8
// ok now determine at which bit it must be written to. which bit
position
BitIndex := ToBitPointer and 7; // mod 8
// now simply ****ft the mask and the content this ammount of bits
Mask := Mask shl BitIndex;
Content := Content shl BitIndex;
// now simply clear the bits first in the buffer which are to be
overwritten. clear it with the mask.
// first invert mask so the rest remains
int64( pointer(Longword(ToBase) + ByteIndex)^ ) := int64(
pointer(Longword(ToBase) + ByteIndex)^ ) and (not Mask);
// now simply or the content into it
int64( pointer(Longword(ToBase) + ByteIndex)^ ) := int64(
pointer(Longword(ToBase) + ByteIndex)^ ) or Content;
end;
// Skybuck's special asm version 1
// correct
procedure WriteLongwordBitsSpecialAsmV1( Value : longword; Bits :
longword;
DestAddress : pointer; DestBitIndex : longword );
var
vContent : longword;
vMask : longword;
v****ft : longword;
vFirstContent : longword;
vSecondContent : longword;
vFirstMask : longword;
vSecondMask : longword;
vFirstAddress : longword;
vSecondAddress : longword;
begin
vContent := KeepLowBits( Value, Bits );
vMask := KeepLowBits( 4294967295, Bits );
v****ft := DestBitIndex and 7;
vFirstContent := ****ftLeft( vContent, 0, v****ft );
vSecondContent := ****ftLeft( 0, vContent, v****ft );
vFirstMask := ****ftLeft( vMask, 0, v****ft );
vSecondMask := ****ftLeft( 0, vMask, v****ft );
vFirstAddress := longword(DestAddress) + (DestBitIndex shr 3); // div 32
vSecondAddress := vFirstAddress + 4;
Plongword(vFirstAddress)^ := (Plongword(vFirstAddress)^ and not
vFirstMask)
or vFirstContent;
Plongword(vSecondAddress)^ := (Plongword(vSecondAddress)^ and not
vSecondMask) or vSecondContent;
end;
// Skybuck's special asm version 2
procedure WriteLongwordBitsSpecialAsmV2( Value : longword; Bits :
longword;
DestAddress : pointer; DestBitIndex : longword );
var
vContent : longword;
vMask : longword;
v****ft : longword;
vFirstContent : longword;
vFirstMask : longword;
vFirstAddress : longword;
// recycle the variables above, little bit dangerous because
// compiler might be buggy, but so far it seems to be working.
vSecondContent : longword absolute vFirstContent;
vSecondMask : longword absolute vFirstMask;
vSecondAddress : longword absolute vFirstAddress;
begin
vContent := KeepLowBits( Value, Bits );
vMask := KeepLowBits( 4294967295, Bits );
v****ft := DestBitIndex and 7; // mod 8
DestBitIndex := DestBitIndex shr 3; // div 8
vFirstContent := ****ftLeft( vContent, 0, v****ft );
vFirstMask := ****ftLeft( vMask, 0, v****ft );
vFirstAddress := longword(DestAddress) + DestBitIndex;
Plongword(vFirstAddress)^ := (Plongword(vFirstAddress)^ and not
vFirstMask)
or vFirstContent;
vSecondContent := ****ftLeft( 0, vContent, v****ft );
vSecondMask := ****ftLeft( 0, vMask, v****ft );
vSecondAddress := longword(DestAddress) + DestBitIndex + 4;
Plongword(vSecondAddress)^ := (Plongword(vSecondAddress)^ and not
vSecondMask) or vSecondContent;
end;
// Terje's Version 2 Fixed
procedure WriteLongwordBitsTerje2Fixed
(
Value : longword;
BitCount : longword;
DestAddress : pointer;
DestBitIndex : longword
);
var
vLastBit : longword;
vBottomMask : longword;
vTopMask : longword;
begin
// Normalize bit index
Inc( Plongword(DestAddress), DestBitIndex shr 5 ); // compiler magic at
work, I don't like it but ok.
DestBitIndex := DestBitIndex and 31;
// Index of last inserted bit:
vLastBit := DestBitIndex + BitCount - 1;
// Generate a mask of 0 bits below the first inserted bit */
vBottomMask := ( longword(-1) ) shl DestBitIndex;
// There must be 0 to 31 unmodified bits after the last inserted: */
vTopMask := ( longword(-1) ) shr (31-(vLastBit and 31));
// One or two affected words? */
if (vLastBit >= 32) then
begin
Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vBottomMask)
or (Value shl DestBitIndex);
Longword(DestAddress) := Longword(DestAddress) + 4;
// Since we modify two words, DestBitIndex MUST be > 0! */
Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not vTopMask) or
(Value shr (32-DestBitIndex));
end else
begin
// It all fits in a single word !
Plongword(DestAddress)^ := (Plongword(DestAddress)^ and not (vBottomMask
and vTopMask)) or (Value shl DestBitIndex); // topmask incorrect.
end;
end;
type
TbenchmarkInfo = packed record
mNumber : longword;
mName : string;
mAuthor : string;
mRoutine : Troutine;
mSeconds : double;
mBitString : string;
mVerified : boolean;
end;
var
mBufferSize : longword;
mBuffer : pointer;
mFrequency : int64;
mVerificationBitString : string;
mBenchmarkCount : longword;
mBenchmarkInfo : array of TbenchmarkInfo;
// 1000 for extra buffer overrun safety space ;)
procedure AllocateBuffer;
begin
GetMem( mBuffer, mBufferSize + 1000 ); // 64.512.000 bytes
end;
procedure ClearBuffer;
begin
FillChar( mBuffer^, mBufferSize, 0 );
end;
procedure FreeBuffer;
begin
FreeMem( mBuffer, mBufferSize + 1000 );
end;
procedure AllocateBenchmarkInfo;
begin
SetLength( mBenchmarkInfo, mBenchmarkCount );
end;
procedure FreeBenchmarkInfo;
begin
mBenchmarkInfo := nil;
end;
function BitLength( Value : longword ) : byte; inline;
begin
result := 0;
while (Value > 0) do
begin
result := result + 1;
Value := Value shr 1;
end;
if result = 0 then
begin
result := 1;
end;
end;
procedure BuildVerificationString;
var
vValue : longword;
vBitCount : longword;
vDestBitIndex : longword;
vDestBitSize : longword;
begin
write('Building verification bit string... ');
// clear buffer
ClearBuffer;
// set randseed
RandSeed := 12345678;
vDestBitIndex := 0;
vDestBitSize := mBufferSize * 8;
while vDestBitIndex < vDestBitSize do
begin
vValue := Random(2147483647);
// vBitCount := 1 + Random(32);
vBitCount := BitLength( vValue );
WriteLongwordSlowInt64( vValue, vBitCount, mBuffer, vDestBitIndex );
vDestBitIndex := vDestBitIndex + vBitCount;
end;
pointer(mVerificationBitString) := nil;
mVerificationBitString := BitString( mBuffer, mBufferSize * 8 );
writeln( 'done.');
end;
procedure Benchmark( var BenchmarkInfo : TbenchmarkInfo );
var
vTick1 : int64;
vTick2 : int64;
vInterval : int64;
vValue : longword;
vBitCount : longword;
vDestBitIndex : longword;
vDestBitSize : longword;
begin
if not Assigned(BenchmarkInfo.mRoutine) then exit;
write('Benchmarking: ', BenchmarkInfo.mNumber, ' Name: ',
BenchmarkInfo.mName );
// clear buffer
ClearBuffer;
// set randseed
RandSeed := 12345678;
// sleep( 1000 ) for full thread time slice
sleep( 1000 );
QueryPerformanceCounter( vTick1 );
vDestBitIndex := 0;
vDestBitSize := mBufferSize * 8;
while vDestBitIndex < vDestBitSize do
begin
vValue := Random(2147483647);
// vBitCount := 1 + Random(32);
vBitCount := BitLength( vValue );
BenchmarkInfo.mRoutine( vValue, vBitCount, mBuffer, vDestBitIndex );
vDestBitIndex := vDestBitIndex + vBitCount;
end;
QueryPerformanceCounter( vTick2 );
vInterval := vTick2 - vTick1;
BenchmarkInfo.mSeconds := vInterval / mFrequency;
write( ' Verifieing... ');
BenchmarkInfo.mBitString := BitString( mBuffer, mBufferSize * 8 );
BenchmarkInfo.mVerified := false;
if BenchmarkInfo.mBitString = mVerificationBitString then
begin
BenchmarkInfo.mVerified := true;
end;
// cleared after verification to save memory.
BenchmarkInfo.mBitString := '';
writeln(' done.');
end;
procedure PerformBenchmarks;
var
vIndex : longword;
begin
// perform benchmarks
vIndex := 0;
while vIndex < mBenchmarkCount do
begin
Benchmark( mBenchmarkInfo[vIndex] );
vIndex := vIndex + 1;
end;
end;
procedure DisplayBenchmarkResults;
var
vIndex : longword;
begin
writeln('Benchmark Results:');
writeln('-------------------------------------------------------------------------------');
// perform benchmarks vIndex := 0; while vIndex < mBenchmarkCount do begin
writeln( 'Number: ', mBenchmarkInfo[vIndex].mNumber ); writeln( 'Name: ',
mBenchmarkInfo[vIndex].mName ); writeln( 'Author: ',
mBenchmarkInfo[vIndex].mAuthor ); writeln( 'Seconds: ',
mBenchmarkInfo[vIndex].mSeconds:16:2 ); writeln( 'Verified: ',
mBenchmarkInfo[vIndex].mVerified ); writeln; vIndex := vIndex + 1;
end;end;procedure Main;begin writeln('program started'); // get benchmark
frequency QueryPerformanceFrequency( mFrequency ); // set buffer size
mBufferSize := 640*480*3*70; // allocate buffer AllocateBuffer; // set
benchmarks mBenchmarkCount := 5; // allocate benchmarkInfo
AllocateBenchmarkInfo; // setup benchmarks with mBenchmarkInfo[0] do begin
mNumber := 1; mName := 'WriteLongwordBitsA1B1v4'; mAuthor := 'Skybuck
Flying'; mRoutine := WriteLongwordBitsA1B1v4; end; with mBenchmarkInfo[1]
do begin mNumber := 2; mName := 'WriteLongwordSlowInt64'; mAuthor :=
'Skybuck Flying'; mRoutine := WriteLongwordSlowInt64; end; with
mBenchmarkInfo[2] do begin mNumber := 3; mName :=
'WriteLongwordBitsSpecialAsmV1'; mAuthor := 'Skybuck Flying'; mRoutine
:= WriteLongwordBitsSpecialAsmV1; end; with mBenchmarkInfo[3] do begin
mNumber := 4; mName := 'WriteLongwordBitsSpecialAsmV2'; mAuthor :=
'Skybuck Flying'; mRoutine := WriteLongwordBitsSpecialAsmV2; end; with
mBenchmarkInfo[4] do begin mNumber := 5; mName :=
'WriteLongwordBitsTerje2Fixed'; mAuthor := 'Terje'; mRoutine :=
WriteLongwordBitsTerje2Fixed; end; // build verification string must be
done after buffer is set up BuildVerificationString; // perform
bennchmarks PerformBenchmarks; // display benchmark results
DisplayBenchmarkResults; // free benchmark info FreeBenchmarkInfo; // free
buffer FreeBuffer; writeln('program finished');end;begin try Main; except
on E:Exception do Writeln(E.Classname, ': ', E.Message); end;
readln;end.// *** End of Program ***Bye, Skybuck.


|