Tuesday, March 27, 2012

A difficult Combining Rows problem

Greetings,
I'm working to combine rows based on a time window and I am hoping to
be able to write a stored procedure to do this for me, rather than have
parse through all this data in my program. I'm not very well versed
with T-SQL syntax.. just enough to get by selecting using inner joins,
updating and inserting... thats about it. (Hence why I am here.)
The raw data I have below looks like this:
groupID, StartTime, EndTime, Min, Max, Points
----
1, 2005-10-05 06:00, 2005-10-05 06:14:59, 7, 32, 13
1, 2005-10-05 06:15, 2005-10-05 06:29:59, 5, 29, 6
1, 2005-10-05 06:30, 2005-10-05 06:44:59, 5, 28, 4
1, 2005-10-05 06:45, 2005-10-05 06:59:59, 5, 29, 16
1, 2005-10-05 07:00, 2005-10-05 07:14:59, 5, 23, 13
1, 2005-10-05 07:15, 2005-10-05 07:29:59, 5, 25, 18
1, 2005-10-05 07:30, 2005-10-05 07:44:59, 5, 34, 49
1, 2005-10-05 07:45, 2005-10-05 07:59:59, 5, 31, 49
Pretty straight forward; you can see each entry is a 15 minute time
interval. What I want to be able to do is to use a view or a stored
procedure to view this in one hour chunks, like below:
groupID, StartTime, EndTime, Min, Max, Points
----
1, 2005-10-05 06:00, 2005-10-05 06:59:59, 5, 32, 39
1, 2005-10-05 07:00, 2005-10-05 07:59:59, 5, 34, 129
This involves several things:
- Recognizing that there are variable # of rows (maybe we only have 3
15 minute entries instead of 4)
- Getting a min of those row's min column
- Getting a max of those row's max column
- Getting a total for those row's points column
- Input to any view or whatver would be based on the startTime and
endTime and would always be in whole hours.
I have a feeling that I am going to be doing this all in the C# .NET
end of things, but it's at least worth a shot asking all of you SQL
experts. What I am basically interested in knowing is, do you all
think that this is possible using views or stored procedures or
something else I don't know about. I didn't even know about views
until i started researching how to do this.
Any ideas? Is this possible? Should I just give up and do it on the
C# end of things? Seems to me that it might be possible to do in a
stored procedure, but possible not worth my time. I aprpeciate any
help or suggestions.
JasonTry this:
SELECT groupid,
MIN(DATEADD(HH,DATEDIFF(HH,'20050101',st
arttime),'20050101')),
MIN(DATEADD(HH,DATEDIFF(HH,'20050101',st
arttime),'2005-01-01T00:59:59')),
MIN(min), MAX(max), SUM(points)
FROM tbl
GROUP BY groupid, DATEDIFF(HH,'20050101',starttime) ;
David Portas
SQL Server MVP
--|||Hi
Check out the dateadd/datepart functions in Books Online for rounding times.
Try:
SELECT GROUPID, DATEADD(mi,-DATEPART(mi,Starttime),Starttime) AS StartTime,
DATEADD(ms,-3,DATEADD(hh,1,DATEADD(mi,-DATEPART(mi,Starttime),Starttime)))
AS EndTime,
Min([Min]), Max([Max]), SUM([Points])
FROM Readings
GROUP BY GroupId,
DATEADD(mi,-DATEPART(mi,Starttime),Starttime),
DATEADD(ms,-3,DATEADD(hh,1,DATEADD(mi,-DATEPART(mi,Starttime),Starttime)))
John
"Factor" wrote:

> Greetings,
> I'm working to combine rows based on a time window and I am hoping to
> be able to write a stored procedure to do this for me, rather than have
> parse through all this data in my program. I'm not very well versed
> with T-SQL syntax.. just enough to get by selecting using inner joins,
> updating and inserting... thats about it. (Hence why I am here.)
> The raw data I have below looks like this:
> groupID, StartTime, EndTime, Min, Max, Points
> ----
> 1, 2005-10-05 06:00, 2005-10-05 06:14:59, 7, 32, 13
> 1, 2005-10-05 06:15, 2005-10-05 06:29:59, 5, 29, 6
> 1, 2005-10-05 06:30, 2005-10-05 06:44:59, 5, 28, 4
> 1, 2005-10-05 06:45, 2005-10-05 06:59:59, 5, 29, 16
> 1, 2005-10-05 07:00, 2005-10-05 07:14:59, 5, 23, 13
> 1, 2005-10-05 07:15, 2005-10-05 07:29:59, 5, 25, 18
> 1, 2005-10-05 07:30, 2005-10-05 07:44:59, 5, 34, 49
> 1, 2005-10-05 07:45, 2005-10-05 07:59:59, 5, 31, 49
> Pretty straight forward; you can see each entry is a 15 minute time
> interval. What I want to be able to do is to use a view or a stored
> procedure to view this in one hour chunks, like below:
> groupID, StartTime, EndTime, Min, Max, Points
> ----
> 1, 2005-10-05 06:00, 2005-10-05 06:59:59, 5, 32, 39
> 1, 2005-10-05 07:00, 2005-10-05 07:59:59, 5, 34, 129
> This involves several things:
> - Recognizing that there are variable # of rows (maybe we only have 3
> 15 minute entries instead of 4)
> - Getting a min of those row's min column
> - Getting a max of those row's max column
> - Getting a total for those row's points column
> - Input to any view or whatver would be based on the startTime and
> endTime and would always be in whole hours.
> I have a feeling that I am going to be doing this all in the C# .NET
> end of things, but it's at least worth a shot asking all of you SQL
> experts. What I am basically interested in knowing is, do you all
> think that this is possible using views or stored procedures or
> something else I don't know about. I didn't even know about views
> until i started researching how to do this.
> Any ideas? Is this possible? Should I just give up and do it on the
> C# end of things? Seems to me that it might be possible to do in a
> stored procedure, but possible not worth my time. I aprpeciate any
> help or suggestions.
> Jason
>|||John Bell and David Portas,
I will have to read up on these Dateadd/DatePart parameters an actually
interpret what is going on within these statements, but just from what
you gave me here it looks like this will work out very well, and I
really appreciate the insight. This will allow me to vary that time
window fairly easily I do believe, all on a SQL call (that's much
better than bringing back all the data and parsing through it all it.
Thanks again,
Jason|||John
I have read over those functions and I now understand what they do and
how to use them, but I am still as to why the min / max /
total fields actually work. I assume it has something to do with the
GROUP BY statements, but again, I don't know why.
Assuming black magic happens and thats just how it works, I should just
be able to change those hh,1 to hh,4 and get 4 hour increments instead.
When I do that, the Starttime and Endtime values do return correctly
(although I do get an entry for 8-12, 9-1, 10-2, etc... thats fine) but
the MIN/MAX/SUM stuff is still reflective of the 1 hour timing.. so
that black magic that is limiting the MIN/MAX/SUM to one hour is still
limiting them to one hour even with the altered start and end times.
I'm unsure how to fix or get around this because I don't yet understand
what is limiting that max to an hour in the first place. How does this
work? I've been tripped up GROUP BY things before, it's my kryptonite
for some reason.
Hope that is not too confusing, I'm all jumbled in my head.
I really apprecaite the help with this so far, you've all been
wonderful.
Jason|||John
I have read over those functions and I now understand what they do and
how to use them, but I am still as to why the min / max /
total fields actually work. I assume it has something to do with the
GROUP BY statements, but again, I don't know why.
Assuming black magic happens and thats just how it works, I should just
be able to change those hh,1 to hh,4 and get 4 hour increments instead.
When I do that, the Starttime and Endtime values do return correctly
(although I do get an entry for 8-12, 9-1, 10-2, etc... thats fine) but
the MIN/MAX/SUM stuff is still reflective of the 1 hour timing.. so
that black magic that is limiting the MIN/MAX/SUM to one hour is still
limiting them to one hour even with the altered start and end times.
I'm unsure how to fix or get around this because I don't yet understand
what is limiting that max to an hour in the first place. How does this
work? I've been tripped up GROUP BY things before, it's my kryptonite
for some reason.
Hope that is not too confusing, I'm all jumbled in my head.
I really apprecaite the help with this so far, you've all been
wonderful.
Jason|||On 10 Nov 2005 11:41:54 -0800, Factor wrote:

>John
>I have read over those functions and I now understand what they do and
>how to use them, but I am still as to why the min / max /
>total fields actually work. I assume it has something to do with the
>GROUP BY statements, but again, I don't know why.
Hi Jason,
Correct. The GROUP BY tells SQL Server to combine the data from several
rows into one row. This is normally used to report totals, minimum,
maximum per project, per section, etc. But with the appropriate
expression, it cal also be used to combine rows that fit in the same
"period" into one group.
Though John's and David's versions both work, I suggest you go with
Davids version, as this is more flexible. (And, once you get your head
around it, easier to understand as well).
Basically, John's version works by taking each of the date parts you
want to disregard (milliseconds, seconds, minutes), then subtracting
that amount of time from the Starttime. The end result will of course be
the last full hour equal to or before Starttime.
David's version works the other way around - it calculates the number of
full hours that have elapsed since a chosen anchor date, then adds that
number to the chosen anchor date. The result will be the same as John's
expression.
(Note: David chose to just use the number of hours for the group by, and
add it back to the anchor date in the SELECT clause only)

>Assuming black magic happens and thats just how it works, I should just
>be able to change those hh,1 to hh,4 and get 4 hour increments instead.
No. I'll give you two examples how to modify David's query to report on
4-hour intervals and to report on 1/2-hour intervals.
For 4-hour intervals, again calculate the number of hours since an
anchor date. Divide by 4 and truncate, then multiply by 4 again. Add
this number of hours to the anchor date. There you have the start of the
last 4-hour interval
SELECT groupid,
MIN(DATEADD(hour,
4 * (DATEDIFF(hour, '20050101', Starttime) / 4),
'20050101')),
MIN(DATEADD(hour,
4 * (DATEDIFF(hour, '20050101', Starttime) / 4),
'2005-01-01T03:59:59')),
MIN(min), MAX(max), SUM(points)
FROM tbl
GROUP BY groupid, DATEDIFF(hour, '20050101', Starttime) / 4;
For 1/2-hour intervals, we can't divide the number of hours sice the
anchor date by 0.5, as that won't give us back the precision we already
lost. Instead, we'll have to calculate minutes and divide by 30:
SELECT groupid,
MIN(DATEADD(minute,
30 * (DATEDIFF(minute, '20050101', Starttime) /30),
'20050101')),
MIN(DATEADD(minute,
30 * (DATEDIFF(minute, '20050101', Starttime) /30),
'2005-01-01T00:29:59')),
MIN(min), MAX(max), SUM(points)
FROM tbl
GROUP BY groupid, DATEDIFF(minute, '20050101', Starttime) /30;
In both cases, don't forget to change the shifted anchor value in the
expression for the end point of the interval. Instead of using the same
anchor date, then adding 30 minunte or 4 hours minus one second, the
anchor date is shifted by 30 minutes or 4 hours minus one second.
Now, the above code can still be simplified further. If your table
always has the complete data (as your sample roiws indicate), then you
could change the above queries to:
SELECT groupid,
MIN(StartTime), MAX(EndTime),
MIN(min), MAX(max), SUM(points)
FROM tbl
GROUP BY groupid, DATEDIFF(minute, '20050101', Starttime) /30;
-- or: GROUP BY groupid, DATEDIFF(hour, '20050101', Starttime) / 4;
Note that this might show "holes" in the periods if your real data is
not as complete as the sample you posted indicates. But the advantage is
that you get rid of the "shifted" anchor date for calculating end time.
Final step would be to put it in a stored procedure and use a parameter
for the interval length (in minutes):
SELECT groupid,
MIN(StartTime), MAX(EndTime),
MIN(min), MAX(max), SUM(points)
FROM tbl
GROUP BY groupid, DATEDIFF(minute, '20050101', Starttime) / @.Interval;
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi Jason,
What David Provided is an Excellent query .
Let us see if this query can help you.
Select GID , Min(STime) , Max(ETime) ,
Min(Minimum),Max(Maximum),Sum(Points)[co
lor=darkred]
>From yourTableName Group By[/color]
GID,Convert(varchar,STime,112),DatePart(
hh,STime)
Having same name as Functions/ Keyword sound confusing to me so I
changed them.
With Warm Regards
Jatinder Singh|||Hi
This is easier with David's method (see Hugo's reply for an explanation).
Dividing the number of hours by 4 and dropping the remainder will give you 4
hour chunks when they are multiplied back up. You also need to change the en
d
time to give a 4 hour gap.
SELECT groupid,
MIN(DATEADD(HH,
4*(DATEDIFF(HH,'20050101',starttime)/4),'20050101')
) AS Starttime,
MAX(DATEADD(HH,
4*(DATEDIFF(HH,'20050101',starttime)/4),'2005-01-01T03:59:59')
) AS Endtime,
MIN(min) AS [Min],
MAX(max) AS [Max],
SUM(points) AS [Total Points]
FROM Readings
GROUP BY groupid,
4*(DATEDIFF(HH,'20050101',starttime)/4)
John
"Factor" wrote:

> John
> I have read over those functions and I now understand what they do and
> how to use them, but I am still as to why the min / max /
> total fields actually work. I assume it has something to do with the
> GROUP BY statements, but again, I don't know why.
> Assuming black magic happens and thats just how it works, I should just
> be able to change those hh,1 to hh,4 and get 4 hour increments instead.
> When I do that, the Starttime and Endtime values do return correctly
> (although I do get an entry for 8-12, 9-1, 10-2, etc... thats fine) but
> the MIN/MAX/SUM stuff is still reflective of the 1 hour timing.. so
> that black magic that is limiting the MIN/MAX/SUM to one hour is still
> limiting them to one hour even with the altered start and end times.
> I'm unsure how to fix or get around this because I don't yet understand
> what is limiting that max to an hour in the first place. How does this
> work? I've been tripped up GROUP BY things before, it's my kryptonite
> for some reason.
> Hope that is not too confusing, I'm all jumbled in my head.
> I really apprecaite the help with this so far, you've all been
> wonderful.
> Jason
>|||Wondeful! Lots ot take in, I thank everyone for their help. I've made
a lot of progress and I've learned a TON about SQL int he past two
days.
I hope I can help you all in the future with something!
Thanks again,
Jason

No comments:

Post a Comment