Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Help Groups for Snitz Forums 2000 Users
 Help: Database: MS SQL Server
 Eliminating Duplicate Archive Reply's
 New Topic  Topic Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

Podge
Support Moderator

Ireland
3775 Posts

Posted - 06 June 2008 :  05:38:20  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
As a result of installing the full text search on a forum I have discovered multiple duplicate rows on the FORUM_A_REPLY table. Some have 2, 6 or even 8 occurrences.

After reading this - http://www.sqlteam.com/article/deleting-duplicate-records

I tried this but it won't work because you cannot use DISTINCT with text or ntext columns
sp_rename 'FORUM_A_REPLY', 'TEMP_FORUM_A_REPLY'

select distinct *
into FORUM_A_REPLY
from TEMP_FORUM_A_REPLY

drop table TEMP_FORUM_A_REPLY
Am I right in thinking that the first step is going to be to create an identity column? or will this T-SQL work ?

DECLARE @Count int
DECLARE @REPLY_ID nvarchar(50)


DECLARE duplicate_cursor CURSOR FAST_FORWARD FOR 
SELECT REPLY_ID, Count(*) - 1
FROM FORUM_A_REPLY 
GROUP BY REPLY_ID 
HAVING Count(*) > 1

OPEN duplicate_cursor 

FETCH NEXT FROM duplicate_cursor INTO @REPLY_ID, @Count

WHILE @@FETCH_STATUS = 0
BEGIN

SET ROWCOUNT @Count
DELETE FROM FORUM_A_REPLY WHERE REPLY_ID = @REPLY_ID
SET ROWCOUNT 0

FETCH NEXT FROM duplicate_cursor INTO @REPLY_ID, @Count
END

CLOSE duplicate_cursor 
DEALLOCATE duplicate_cursor


I can always check if the above code eliminates duplicates however I afraid it may delete non-duplicates also. Any advice (apart from backing everything up before trying it). How can I check I have the correct amount of reply's after it executes ?<

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 06 June 2008 :  05:51:29  Show Profile  Visit HuwR's Homepage
quote:
How can I check I have the correct amount of reply's after it executes ?
you should be able to use the reply_count from the topic to ascertain how many replies there should be<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:00:29  Show Profile  Send ruirib a Yahoo! Message
A quick look at what you wrote seems to indicate that you can, indeed, also delete the original replies. I think I wrote something to avoid that, let me check if I saved it.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:20:09  Show Profile  Send ruirib a Yahoo! Message
Here is what I'd suggest:

1. Create a temporary int column, called it, R2
2. Use this script to update the value of R2 and delete all records with R2=1 (duplicate records, since original records will have R2=0)

DECLARE @prev int;
DECLARE @ID int;
DECLARE RC CURSOR 
DYNAMIC
FOR SELECT REPLY_ID FROM FORUM_A_REPLY ORDER BY REPLY_ID;

SET @prev=0
OPEN RC

FETCH NEXT FROM RC INTO @ID

WHILE @@FETCH_STATUS=0
BEGIN

   IF @ID = @prev
      UPDATE FORUM_A_REPLY SET R2=1 WHERE CURRENT OF RC
   Else
      BEGIN
	UPDATE FORUM_A_REPLY SET R2=0 WHERE CURRENT OF RC
        SET @prev = @ID
      END

 FETCH NEXT FROM RC INTO @ID

END

CLOSE RC
DEALLOCATE RC

DELETE FROM FORUM_A_REPLY WHERE R2=1


I have tested this, but I have no current database without duplicate replies... if you can backup before using it, it's safe, though I'm almost sure it will work without damaging anything of value.

I did run a script similar to this on a live DB and it worked, but it was for topics, so this is untested in a live DB (yeah, it's a disclaimer ).

P.S.: Of course, once completed, delete R2.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 06 June 2008 :  06:28:27  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
I adapted it rather than wrote the whole thing.

The HAVING Count(*) > 1 clause in the original select should prevent replies with no duplicates from being selected. The Count(*) - 1 which is used to set the rowcount should always leave one copy of the row. I don't think there is a possibility that rowcount could ever be set to 0 before the delete statement and it will always be one less than the number of duplicate rows.

Wouldn't the number of records returned by SELECT DISTINCT REPLY_ID return the number of rows I'm supposed to have in that table ?<

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 06 June 2008 :  06:30:47  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
quote:
I'm almost sure it will work with damaging anything of value.

I hope thats a typo

I'm going to backup the database and try both to see if they return the same results.<

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:33:02  Show Profile  Send ruirib a Yahoo! Message
The problem is, even for duplicate replies, you want to keep one of those, right?<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:35:13  Show Profile  Send ruirib a Yahoo! Message
quote:
Originally posted by Podge

quote:
I'm almost sure it will work with damaging anything of value.

I hope thats a typo

I'm going to backup the database and try both to see if they return the same results.


Yeah, sorry, just corrected the original post.

Actually, my script won't work, because, I think, there is no way to distinguish between similar records(no primary key). I'm working on it and will update it ASAP.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 06 June 2008 :  06:39:37  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
I think this came up before. Unless I'm mistaken indexes are not created on FORUM_A_REPLY or FORUM_A_TOPICS by a base install. Wouldn't indexes prevent this problem & benefit searches?

quote:
The problem is, even for duplicate replies, you want to keep one of those, right?
My code should though.
SET ROWCOUNT @Count -- this count will always be one less than the amount of duplicates
DELETE FROM FORUM_A_REPLY WHERE REPLY_ID = @REPLY_ID
SET ROWCOUNT 0


In English - One less than the amount of duplicate rows should be deleted where rows have a count of 2 or more. Try your method also. I want to compare the results of both. If they agree I will be happy as it should confirm that I've done it correctly.<

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:43:26  Show Profile  Send ruirib a Yahoo! Message
Ok, regarding the code.

Yes, the table should have a primary key (reply_id) and indexes too.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  06:51:23  Show Profile  Send ruirib a Yahoo! Message
To do it, the way I did it, it requires the addition of an (additional) identity column, which I named R3. The code would go like this:

DECLARE @prev int;
DECLARE @ID int,@R3 int;
DECLARE RC CURSOR 
FOR SELECT REPLY_ID,R3 FROM FORUM_A_REPLY ORDER BY REPLY_ID;

SET @prev=0
OPEN RC

FETCH NEXT FROM RC INTO @ID,@R3

WHILE @@FETCH_STATUS=0
BEGIN

   IF @ID = @prev
      UPDATE FORUM_A_REPLY SET R2=1 WHERE R3=@R3
   Else
      BEGIN
	UPDATE FORUM_A_REPLY SET R2=0 WHERE R3=@R3
        SET @prev = @ID
      END

 FETCH NEXT FROM RC INTO @ID,@R3

END

CLOSE RC
DEALLOCATE RC

DELETE FROM FORUM_A_REPLY WHERE R2=1


This one works, I have created some duplicates and tested it. Seems a bit convoluted, though, your approach seems simpler. Anyway, here you have the code if you want to use it.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 06 June 2008 :  08:20:37  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
Just to be clear. I need to create a temporary int column called R2 & another identity column called R3 like this
Alter table FORUM_A_REPLY ADD R2 int NULL
ALTER TABLE FORUM_A_REPLY ADD R3 int IDENTITY(1,1)
and then delete R2 & R3 after I finish running your script.

Thanks for your help again Rui. I'll let you know the results.
<

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.

Edited by - Podge on 06 June 2008 08:21:21
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 06 June 2008 :  08:26:10  Show Profile  Send ruirib a Yahoo! Message
Yep, that's about it.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.14 seconds. Powered By: Snitz Forums 2000 Version 3.4.07