Tuesday, March 27, 2012

A cursor performance related question

Hi All,
I have 800 K records need to be processed one by one in a large table daily
using cursor way to update the default columns. The rest of queries are only
read information from this table. The records in this table could be very
large later on.
My question is what's the best way to use cursor to duel with this table.
here is my options:
1.Use a BIG cursor to lock all the un-processed records and use a singal
connection from Query Analyzer
2. Try to break /subgroup them with flags and run the same procedure above
with muti-Query Analyzer Connections. Each connection only duel part of the
record set.
I only have one SQL server . The testing result is Option 1 has the best
performance.
Is SQL server not good at running query parallelly with muti-connections? or
I need to improve the SQL server hardware staff by adding more memory ?
Any expert can point me to the right way?
Many Thanks,
Stevenews.microsoft.com (stevenxiu@.yahoo.com) writes:
> I have 800 K records need to be processed one by one in a large table
> daily using cursor way to update the default columns. The rest of
> queries are only read information from this table. The records in this
> table could be very large later on.
> My question is what's the best way to use cursor to duel with this
> table. here is my options:
> 1.Use a BIG cursor to lock all the un-processed records and use a singal
> connection from Query Analyzer
> 2. Try to break /subgroup them with flags and run the same procedure
> above with muti-Query Analyzer Connections. Each connection only duel
> part of the record set.
> I only have one SQL server . The testing result is Option 1 has the best
> performance.
> Is SQL server not good at running query parallelly with
> muti-connections? or I need to improve the SQL server hardware staff by
> adding more memory ?
With only this abstract narrative it is impossible to say very much. One
possibility is that you have poor indexing, cause the multiple connections
to block each other.
To get any accurate response you would need to post:
o CREATE TABLE and CREATE INDEX statements for the table.
o The code for the two options you are using.
o Some background on what the code is actually doing.
However, there is fair chance that the answer is option 0: don't use a
cursor at all, but apply set-based logic. This usually improves performance
with magnitudes.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||"news.microsoft.com" <stevenxiu@.yahoo.com> wrote in message
news:u9WHOCx8FHA.3044@.TK2MSFTNGP10.phx.gbl...
> I have 800 K records need to be processed one by one in a large table
> daily
> using cursor way to update the default columns.
Chances are that you don't need to do it "one by one" and that it will be
more efficient without a cursor. We won't know for sure unless you post a
better description of your problem. See:
http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
But based on a few decades with SQL, I have written only five cursors
and I know that I could have avoided three of them if I had the CASE
expression back in the old days.
The fact that you do not know the differences between records and rows
tells me your approach is probably not relational and that yoiur
mindset is still stuck in sequential file processing.|||You should avoid using cursors to perform updates. Set-based updates
perform better for a number of reasons:
(1) Set-based updates write to the transaction log more efficiently. Each
update incurs a certain amount of overhead in the transaction log to mark
the start and end of each write to each affected object in the database.
This includes not only writes to the heap or clustered index, but also
writes to each nonclustered index. In addition, if the same index page is
updated more than once during the thousands of individual updates, then that
page will be recorded in the transaction log once for each related update,
causing the log to grow faster than is necessary. Frequent disk allocations
can hugely affect performance. (This last shouldn't be a problem if you
pre-allocate log space.)
(2) With set-based updates, indexes can be updated en-mass--meaning fewer
costly page splits and fewer writes to each affected database object thus
reducing the frequency of disk head ss.
(3) Locking is more efficient. Set-based updates obtain locks on all of the
affected rows before beginning the write, and then release them as soon as
the changes have been committed. Thousands of individual updates requires
the server to go through the process of obtaining each individual exclusive
lock on each individual row. With set-based updates, locks are more likely
to be escalated when necessary, thereby reducing overhead.
(4) With set-based updates, the system is tasked with writing and writing
only. Thousands of individual updates usually means that several additional
reads are interspersed within the writes. This can cause a lot more disk
activity and in particular, a significant increase in costly disk ss.
If you must use a cursor (I'm not of the opinion that they are always bad;
however, they should only be used as a last resort.), then you should cache
the updates in a temp table or table variable and then flush them using
set-based updates. Inserting individual rows into a table with no indexes
(a heap) or appending to a table with only a clustered index performs pretty
well, and if you have enough memory, temp tables and table variables remain
for the most part in memory. (Writes to tempdb are eventually flushed out
to disk, provided the affected rows and objects still exist by the time the
system gets around to initiating the write.) By appending, I mean that
inserts occur in the same order as the clustered index key--that is, with an
ascending index, each row inserted has a key value that is greater than the
key value in any existing row in the table.
"news.microsoft.com" <stevenxiu@.yahoo.com> wrote in message
news:u9WHOCx8FHA.3044@.TK2MSFTNGP10.phx.gbl...
> Hi All,
> I have 800 K records need to be processed one by one in a large table
> daily
> using cursor way to update the default columns. The rest of queries are
> only
> read information from this table. The records in this table could be very
> large later on.
> My question is what's the best way to use cursor to duel with this table.
> here is my options:
> 1.Use a BIG cursor to lock all the un-processed records and use a singal
> connection from Query Analyzer
> 2. Try to break /subgroup them with flags and run the same procedure above
> with muti-Query Analyzer Connections. Each connection only duel part of
> the
> record set.
> I only have one SQL server . The testing result is Option 1 has the best
> performance.
> Is SQL server not good at running query parallelly with muti-connections?
> or
> I need to improve the SQL server hardware staff by adding more memory ?
> Any expert can point me to the right way?
> Many Thanks,
> Steve
>sql

No comments:

Post a Comment