TableEquals performance

May 5, 2010 at 9:45 PM

Why don't you utilize a query construct that is scalable?  The code in Internal.GenerateComparisonSQLQuery utilizes joins to compare the data, which is the slowest method that could be used.  It magnifies as the size of the data increases and several hundred thousand rows is becoming a trivial data set,  Instead of doing all of the joining, you can instead do the following:

SELECT TOP 1 *
FROM (SELECT <columns>
        FROM TableA
        EXCEPT
        SELECT <columns>
        FROM TableB) a

SELECT TOP 1 *
FROM (SELECT <columns>
        FROM TableB
        EXCEPT
        SELECT <columns>
        FROM TableA) a

If either query results a row of data, the tables do not match.  This will run in a fraction of the time as the existing code.  It also involves much less code in the Internal.GenerateComparisonSQLQuery stored procedure.

 

May 9, 2010 at 3:13 PM

mhotek,

 Thank you for your suggestion. I am going to do this in the next version. And thank you for the “IsTableNotEmpty” assert – I going to also include that.

Aug 16, 2010 at 12:50 AM
Edited Aug 16, 2010 at 12:53 AM

Mhotek, 

Thank you for your suggestions.  

I included the IsTableNotEmpty in the last release (V 1.6)

I’ve put on hold implementing the EXCEPT solution. Here is my reasoning: Using a solution based on EXECPT will reduce the quality of the output. With the current implementation, in case of a comparison mismatch I can show details in the error message that a solution based on EXECPT will not be able to provide. The main scope of this tool is to verify correctness and that is usually done by verifying scenarios that use controlled sets of data. That means in the case of table comparison that one will most likely compare a relatively small number of rows (most likely less than thousands).

Granted, I can see cases where people will want to validate a new production procedure. They could do that with real data and use TST to compare the old result set vs. the new result set. In that case my current implementation could definitely use the performance improvements allowed by your suggestion. If this will be signaled as an issue by users I may provide an alternate API (something like Assert.FastTableEquals) and implement that based on EXECPT. The new API will have better performance but potentially less details in the error message. At this point I decided to postpone doing so until I have more signals from the users.