I need to do a 4 column lookup against a large table (1 Million rows) that contains 4 different record types. The first lookup will match on colums A, B, C, and D. If no match is found, I try again with colums A, B, C, and '99' in column D. If no match, try again with column A, B, D, and '99' in Column C. Finally, if no match in any of the above, use column A, '99' in B, '99' in C, '99' in D. I will retreive 2 columns from the lookup table.
My thought is that breaking this sequence out into 4 different tables/ lookups would be most efficient. The other option would be to write a script that handled this logic in a single transform with an in-memory table. My concern is that the size of the table would be too large to load into memory.
Any ideas/suggestions would be appreciated.
You only need to pull in the 4 columns from the table that you will be looking up against. Assuming they are normal integers (i.e. 4 bytes) you would need 4 * 4 * 1000000 =16MB per lookup. That's not really all that much. My advice would be to give it a go and if you're havig problems - see where the bottlenecks are.
More on memory per package: http://blogs.conchango.com/jamiethomson/archive/2005/05/29/1486.aspx
-Jamie
No comments:
Post a Comment