Author |
Topic |
|
Podge
Support Moderator
Ireland
3775 Posts |
Posted - 06 June 2008 : 05:38:20
|
As a result of installing the full text search on a forum I have discovered multiple duplicate rows on the FORUM_A_REPLY table. Some have 2, 6 or even 8 occurrences.
After reading this - http://www.sqlteam.com/article/deleting-duplicate-records
I tried this but it won't work because you cannot use DISTINCT with text or ntext columnssp_rename 'FORUM_A_REPLY', 'TEMP_FORUM_A_REPLY'
select distinct *
into FORUM_A_REPLY
from TEMP_FORUM_A_REPLY
drop table TEMP_FORUM_A_REPLY Am I right in thinking that the first step is going to be to create an identity column? or will this T-SQL work ?
DECLARE @Count int
DECLARE @REPLY_ID nvarchar(50)
DECLARE duplicate_cursor CURSOR FAST_FORWARD FOR
SELECT REPLY_ID, Count(*) - 1
FROM FORUM_A_REPLY
GROUP BY REPLY_ID
HAVING Count(*) > 1
OPEN duplicate_cursor
FETCH NEXT FROM duplicate_cursor INTO @REPLY_ID, @Count
WHILE @@FETCH_STATUS = 0
BEGIN
SET ROWCOUNT @Count
DELETE FROM FORUM_A_REPLY WHERE REPLY_ID = @REPLY_ID
SET ROWCOUNT 0
FETCH NEXT FROM duplicate_cursor INTO @REPLY_ID, @Count
END
CLOSE duplicate_cursor
DEALLOCATE duplicate_cursor
I can always check if the above code eliminates duplicates however I afraid it may delete non-duplicates also. Any advice (apart from backing everything up before trying it). How can I check I have the correct amount of reply's after it executes ?< |
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
|
HuwR
Forum Admin
United Kingdom
20584 Posts |
Posted - 06 June 2008 : 05:51:29
|
quote: How can I check I have the correct amount of reply's after it executes ?
you should be able to use the reply_count from the topic to ascertain how many replies there should be< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 06 June 2008 : 06:00:29
|
A quick look at what you wrote seems to indicate that you can, indeed, also delete the original replies. I think I wrote something to avoid that, let me check if I saved it.< |
Snitz 3.4 Readme | Like the support? Support Snitz too |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 06 June 2008 : 06:20:09
|
Here is what I'd suggest:
1. Create a temporary int column, called it, R2 2. Use this script to update the value of R2 and delete all records with R2=1 (duplicate records, since original records will have R2=0)
DECLARE @prev int;
DECLARE @ID int;
DECLARE RC CURSOR
DYNAMIC
FOR SELECT REPLY_ID FROM FORUM_A_REPLY ORDER BY REPLY_ID;
SET @prev=0
OPEN RC
FETCH NEXT FROM RC INTO @ID
WHILE @@FETCH_STATUS=0
BEGIN
IF @ID = @prev
UPDATE FORUM_A_REPLY SET R2=1 WHERE CURRENT OF RC
Else
BEGIN
UPDATE FORUM_A_REPLY SET R2=0 WHERE CURRENT OF RC
SET @prev = @ID
END
FETCH NEXT FROM RC INTO @ID
END
CLOSE RC
DEALLOCATE RC
DELETE FROM FORUM_A_REPLY WHERE R2=1
I have tested this, but I have no current database without duplicate replies... if you can backup before using it, it's safe, though I'm almost sure it will work without damaging anything of value.
I did run a script similar to this on a live DB and it worked, but it was for topics, so this is untested in a live DB (yeah, it's a disclaimer ).
P.S.: Of course, once completed, delete R2.< |
Snitz 3.4 Readme | Like the support? Support Snitz too |
|
|
Podge
Support Moderator
Ireland
3775 Posts |
Posted - 06 June 2008 : 06:28:27
|
I adapted it rather than wrote the whole thing.
The HAVING Count(*) > 1 clause in the original select should prevent replies with no duplicates from being selected. The Count(*) - 1 which is used to set the rowcount should always leave one copy of the row. I don't think there is a possibility that rowcount could ever be set to 0 before the delete statement and it will always be one less than the number of duplicate rows.
Wouldn't the number of records returned by SELECT DISTINCT REPLY_ID return the number of rows I'm supposed to have in that table ?< |
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
|
|
Podge
Support Moderator
Ireland
3775 Posts |
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 06 June 2008 : 06:35:13
|
quote: Originally posted by Podge
quote: I'm almost sure it will work with damaging anything of value.
I hope thats a typo
I'm going to backup the database and try both to see if they return the same results.
Yeah, sorry, just corrected the original post.
Actually, my script won't work, because, I think, there is no way to distinguish between similar records(no primary key). I'm working on it and will update it ASAP.< |
Snitz 3.4 Readme | Like the support? Support Snitz too |
|
|
Podge
Support Moderator
Ireland
3775 Posts |
Posted - 06 June 2008 : 06:39:37
|
I think this came up before. Unless I'm mistaken indexes are not created on FORUM_A_REPLY or FORUM_A_TOPICS by a base install. Wouldn't indexes prevent this problem & benefit searches?
quote: The problem is, even for duplicate replies, you want to keep one of those, right?
My code should though.
SET ROWCOUNT @Count -- this count will always be one less than the amount of duplicates
DELETE FROM FORUM_A_REPLY WHERE REPLY_ID = @REPLY_ID
SET ROWCOUNT 0
In English - One less than the amount of duplicate rows should be deleted where rows have a count of 2 or more. Try your method also. I want to compare the results of both. If they agree I will be happy as it should confirm that I've done it correctly.< |
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 06 June 2008 : 06:51:23
|
To do it, the way I did it, it requires the addition of an (additional) identity column, which I named R3. The code would go like this:
DECLARE @prev int;
DECLARE @ID int,@R3 int;
DECLARE RC CURSOR
FOR SELECT REPLY_ID,R3 FROM FORUM_A_REPLY ORDER BY REPLY_ID;
SET @prev=0
OPEN RC
FETCH NEXT FROM RC INTO @ID,@R3
WHILE @@FETCH_STATUS=0
BEGIN
IF @ID = @prev
UPDATE FORUM_A_REPLY SET R2=1 WHERE R3=@R3
Else
BEGIN
UPDATE FORUM_A_REPLY SET R2=0 WHERE R3=@R3
SET @prev = @ID
END
FETCH NEXT FROM RC INTO @ID,@R3
END
CLOSE RC
DEALLOCATE RC
DELETE FROM FORUM_A_REPLY WHERE R2=1
This one works, I have created some duplicates and tested it. Seems a bit convoluted, though, your approach seems simpler. Anyway, here you have the code if you want to use it.< |
Snitz 3.4 Readme | Like the support? Support Snitz too |
|
|
Podge
Support Moderator
Ireland
3775 Posts |
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
|
|
Topic |
|