Hello!
I need to verify that about 4 million rows were correctly inserted into
the database. Should i do this using one query returning 4 million rows,
or should i query the database 4 million times and returning one row each
query?
The result of the query/queries will be processed in a client application.
Does it matter?
I would prefer the latter, if it can finish in a reasonable amount of time.
Thanks!
Jon"Jon" <jon@.noreply> wrote in message
news:xn0f10jy5j55qzl00t@.news.microsoft.com...
> Hello!
> I need to verify that about 4 million rows were correctly inserted into
> the database. Should i do this using one query returning 4 million rows,
> or should i query the database 4 million times and returning one row each
> query?
>
Neither. Surely it would make far more sense to write a query that verifies
the result server-side. That way you need only return a Yes/No answer or
some other aggregate or exception result.
> The result of the query/queries will be processed in a client application.
> Does it matter?
> I would prefer the latter, if it can finish in a reasonable amount of
> time.
>
It matters! But if client-side row-by-row processing is an absolute
requirement then you'd better test performance for yourself. I don't know
what you consider reasonable.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Hi Jon
"Jon" wrote:
> Hello!
> I need to verify that about 4 million rows were correctly inserted into
> the database. Should i do this using one query returning 4 million rows,
> or should i query the database 4 million times and returning one row each
> query?
What do you mean by this? If you inserted 4 million rows and no error status
was returned and the transaction was committed then they will be inserted! I
f
you need to make sure specific values are in a given column then make sure
that you have the correct column/table constraints in place.
>
John|||The data is not imported from another database, it is file(s) that needs
to be processed before inserting into the database. I need to verify that
the application doing this, does the right thing (which it currently
doesn't). I am not really a pro on T-SQL (can i read from a file using
T-SQL?), so i feel more comfortable doing it in a "normal" application.
I have tried returning all rows in one query, and it takes about 20
minutes to verify the data. The problem is that when elements are missing
from the database, reading the next element from the file and next element
from the database gives a mis-match, because i am simply comparing wrong
items. I can of course "synchronize" this, but i want to keep down the
development time, so if it is just a small time difference, i may just do
it easy for me and not complicate it (the more complicated my code is, the
greater risk of me doing something wrong, which could mean that the
application verifying the insertion has a bug...).
Running time is not critical (it will just be used a few times) and i
would consider about 60 minutes to still be workable to verify the
complete database).
David Portas wrote:
>"Jon" <jon@.noreply> wrote in message
>news:xn0f10jy5j55qzl00t@.news.microsoft.com...
>Neither. Surely it would make far more sense to write a query that
>verifies the result server-side. That way you need only return a Yes/No
>answer or some other aggregate or exception result.
>
>It matters! But if client-side row-by-row processing is an absolute
>requirement then you'd better test performance for yourself. I don't know
>what you consider reasonable.|||Hi,
The insertion is done by another application, reading text files,
processing each line in the text file, and inserting it into the database.
Unfortunately, it does not log errors, and there can also be a bug in the
application (so i need to verify not only that the element exists, but
also that the data is correct).
Verifying that the data is correct cannot be done using a constraint,
because it can possibly have inserted the data from another element into
the database (for example reading the wrong line in the file).
Jon
John Bell wrote:
>Hi Jon
>"Jon" wrote:
>
>What do you mean by this? If you inserted 4 million rows and no error
>status
>was returned and the transaction was committed then they will be inserted!
>If
>you need to make sure specific values are in a given column then make sure
>that you have the correct column/table constraints in place.
>John|||Hi Jon
"Jon" wrote:
> Hi,
> The insertion is done by another application, reading text files,
> processing each line in the text file, and inserting it into the database.
> Unfortunately, it does not log errors, and there can also be a bug in the
> application (so i need to verify not only that the element exists, but
> also that the data is correct).
>
Have you considered BCP which will create a file of records that have
errored? Possibly a DTS/SSIS package could process this better?
> Verifying that the data is correct cannot be done using a constraint,
> because it can possibly have inserted the data from another element into
> the database (for example reading the wrong line in the file).
It sounds like you need to improve the program that does the inserts or the
quality of the data!
> --
> Jon
>
John|||"Jon" <jon@.noreply> wrote in message
news:xn0f10m5sj87sy300v@.news.microsoft.com...
> Hi,
> The insertion is done by another application, reading text files,
> processing each line in the text file, and inserting it into the database.
> Unfortunately, it does not log errors, and there can also be a bug in the
> application (so i need to verify not only that the element exists, but
> also that the data is correct).
> Verifying that the data is correct cannot be done using a constraint,
> because it can possibly have inserted the data from another element into
> the database (for example reading the wrong line in the file).
>
Maybe you could load the file to the server independently and then compare
the two results with a query. Perhaps that seems an odd idea given that you
already have an application doing the same. But if it's possible to bulk
load the file using BCP then I would expect you to see some improvement on
the 20 minute running time you mentioned. Take a look at the BCP topic in
Books Online.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Yes, the program will be improved. But to do this, we need to figure out
what is wrong, hence why i am doing this.
I don't think they are open to loading the data using a complete new
process. I think they want to stick with the original application.
John Bell wrote:
>Hi Jon
>"Jon" wrote:
>
>Have you considered BCP which will create a file of records that have
>errored? Possibly a DTS/SSIS package could process this better?
>
>It sounds like you need to improve the program that does the inserts or the
>quality of the data!
>John|||Hi Jon
> Yes, the program will be improved. But to do this, we need to figure out
> what is wrong, hence why i am doing this.
If that is the case then you will will probably want to take a copy of the
database and not work on the live system!
John|||No, it is not on a live system. But the problem still remains, i need to
somehow compare the data from the files with the data that is loaded in
the database, and the question is what is best to do.
John Bell wrote:
>Hi Jon
>
>If that is the case then you will will probably want to take a copy of the
>database and not work on the live system!
>John
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment