Author |
Topic  |
work mule
Senior Member
   
USA
1358 Posts |
Posted - 10 October 2002 : 04:44:25
|
Something else to consider...
For reference, here are the PK/FK fields:
CAT_ID (FK) FORUM_ID (FK) TOPIC_ID (FK) REPLY_ID (PK)
Thinking about how indexing works, if you create an index on those fields (this is the index labeled FORUM_SnitzC70), that index is going to be pretty well organized/sorted out.
Not that I'm going to recommend anyone do it without further testing/discussion, but what if you make that into a clustered index?
In a clustered index the records would be physically sorted by category id, forum id, topic id and reply id. If you're performing a search for something in a specific forum, the range of records for that forum id is already nicely defined in the clustered index.
I think clustered indexes would an excellent thing to use, but I'm undecided as to which fields would make the best candidate.
MSDN Articles: MS SQL Server 7.0 Performance Tuning Guide: Indexes Using Clustered Indexes Which is Faster: Index Access or Table Scan?
non-MSDN Articles: SQL Server 7.0: Merge Joins Clustered Index Suitability (discussion)
BTW...if you have that book, Chapters 8 and 11 might be worthwhile paging through. It's a pretty nice book which discusses database design without getting into specifics of any particular RDBMS.
Q: Are you a Programmer? A: No, but I stayed at a Holiday Inn Express last night. 
|
 |
|
pweighill
Junior Member
 
United Kingdom
453 Posts |
Posted - 10 October 2002 : 05:13:32
|
quote: Originally posted by work mule II
Not that I'm going to recommend anyone do it without further testing/discussion, but what if you make that into a clustered index?
A clustered index on an identity field is probably a waste of time.
Clustered indexes are good when you need to add lots of data, and do don't want to just be adding it at the end of the table, instead you want to be spreading out the adds over the table to reduce contention on the last page of the table.
For example, you might create a clustered index on member name or topic title if you wanted to spread out the adding. |
 |
|
work mule
Senior Member
   
USA
1358 Posts |
Posted - 10 October 2002 : 05:57:49
|
quote: A clustered index on an identity field is probably a waste of time.
When the identity field is the only field in the clustered index, yes. However in the case of the Reply table where we could include the category id, forum id, topic id (in that order) in the clustered index, that would probably be worth it. For instance, selecting all replies where Topic_ID = x would be a good case for a clustered index which has TOPIC_ID included.
An option that might work for the Reply table would be TOPIC_ID and R_DATE. All replies would be physically stored by TOPIC_ID and then in the order of the R_DATE field. Since the most used query which uses the Reply table is topic.asp and (unless it's been modified) the sort is by R_DATE, by having the data physically stored in that order, the database doesn't actually have to do a sort IF the Order By statement is exactly the same as the clustered index columns/order. Just imagine...total elimination of the sorting overhead. 
I'm thinking this might work... Forum_ID, Topic_ID, R_Date
Including Forum_ID might be good for the search page. When someone chooses to search in a specific forum, then the forum id could be used to narrow down the range of records through the reply table's clustered index.
If that was the case, then going back in topic.asp, the Order By statement (for the Replies) would have to be modified from strSql4 = " ORDER BY R.R_DATE ASC" to strSql4 = " ORDER BY R.FORUM_ID, R.TOPIC_ID, R.R_DATE ASC". SQL Server would see this as exactly the same sort order as the clustered index and therefore wouldn't have to perform a sort.
quote: Clustered indexes are good when you need to add lots of data, and do don't want to just be adding it at the end of the table, instead you want to be spreading out the adds over the table to reduce contention on the last page of the table.
SQL Server takes care of that tho'. Going back to the Reply table and topic id and date in the clustered index, that would spread out the records pretty well unless people only posted in one topic all the time.
In previous versions of SQL Server, for tables without a clustered index (tables like this are referred to as heaps) inserted rows would always be placed at the physical end of the table on disk. This created the possibility of a hot spot at the end of a very busy table. SQL Server 7.0's new storage management algorithms provide free space management, which removes this behavior. Now when rows are inserted in heaps, SQL Server makes use of the PFS pages to quickly locate available free space somewhere in the table in which the row can be inserted. PFS pages indicate free space through the table. This recovers deleted space and avoids insertion hot spots because inserts will be spread through the physical disk space throughout the table. Free space management affects clustered index selection. |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 10 October 2002 : 07:43:09
|
The Snitz database does not strike me as overnormalized.
Regarding the use of a "composite" clustered index, I would say that's not a very good recommendation. I will post my reasons for saying this later, cause I'm a bit pressed for time now. |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
alex042
Average Member
  
USA
631 Posts |
Posted - 10 October 2002 : 09:17:52
|
quote: The Snitz database does not strike me as overnormalized.
The Snitz database is nothing compared to some of the databases I've seen where they may have dozens of tables just for dropdown boxes with only a handful of options that are used to fill them. The kicker is that the relationships weren't setup correctly, the database had few keys, fields were populated using the data field since there were so few keys, and there were few or no integrity checks. One look at it told me it needed a rework. The Snitz database is a lot better compared to that one or some others I've seen for that matter.
|
Edited by - alex042 on 10 October 2002 09:19:54 |
 |
|
work mule
Senior Member
   
USA
1358 Posts |
Posted - 10 October 2002 : 11:08:45
|
quote:
The Snitz database does not strike me as overnormalized.
No, I don't think it's overnormalized at all.
However it would be a pretty good argument as to how someone would further normalize the database without affecting performance (especially the members table) and have performance measurements to back it up.
Good luck because when it comes to databases, everyone is an expert and has an opinion.  |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 10 October 2002 : 11:20:23
|
quote: Originally posted by work mule II
However it would be a pretty good argument as to how someone would further normalize the database without affecting performance (especially the members table) and have performance measurements to back it up.
Personally I wouldn't want to change the DB structure that much, besides what's been posted at this thread (mostly getting rid of unneeded CAT_IDs and FORUM_IDs in some tables).
I probably wouldn't even call splitting the members table as normalization, since you'd be hard pressed to find other candidate keys besides MEMBER_ID, on which you could base your extra normalization steps. I agree that it would be an interesting thing to know whether splitting the MEMBERS table would have any meaningful impact (positive or negative) in performance, but I would risk saying that it is not at all that clear the split would bring performance gains. |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
e3stone
Average Member
  
USA
885 Posts |
Posted - 10 October 2002 : 12:33:08
|
How would you suggest partitioning the members table? Have one table with static data (i.e. Username, name, AIM, email) and another with frequently updated data (i.e. lastheredate, post_count, etc)? |
<-- Eric --> |
 |
|
GauravBhabu
Advanced Member
    
4288 Posts |
Posted - 10 October 2002 : 13:19:40
|
I thought of keeping the required or some basic info(username, password, email, joindate, lastheredate etc) in the members table and member preferences and other personal/optional information in member_details table. Since most members just provide basic information, most columns imn members table remain empty. The member preferences or other optional information can be updated by the user after registration. However, I am not sure if it will bring any performance gains. |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 10 October 2002 : 16:42:22
|
I think I'd choose a similar strategy to that of Gaurav, if I decided to split the table at all. Frequently accessed info in a table, other info (mainly optional or not so frequently used info) in another.
I'm not sure whether that would bring a performance improvement either. |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
work mule
Senior Member
   
USA
1358 Posts |
Posted - 11 October 2002 : 12:53:52
|
quick comment about the members table:
I don't think it matters that the optional stuff is in the table or not. In a table with text fields, the table stores pointers and only uses them to access the text data if you include it in your SELECT statement. The pointers are only created if there is data. If there's nothing to be stored, there are no pointers. So to move them to a different table wouldn't really affect performance all that much.
Also, when you refer to optional stuff, are you including AIM, YAHOO, HOMEPAGE, etc type of fields? If so, that probably wouldn't be a good idea because if you look at the query to build a topic page (most likely the most used page), all of that information is included in the Select query. By putting it in another table, you're adding an additional join to the Select statement, not to mention additional complexity to the forum to manage these tables.
Once you take away all the fields which are used when displaying the topics.asp page, that doesn't leave you with very many fields. Leave out the text fields (bio, hobbies, etc) because they're just pointers, then what's left? Probably not much and most likely it doesn't warrant it's own table.
I know a concern might be that the table appears to wide and the argument could be how many rows can fit on a data page (or whatever - i'm not in a technical term mood atm). Well, if most of the fields are empty, then it doesn't really matter. Most important thing you could do is work towards having an effective indexing strategy on that table. |
 |
|
e3stone
Average Member
  
USA
885 Posts |
Posted - 11 October 2002 : 14:46:25
|
quote: In a table with text fields, the table stores pointers and only uses them to access the text data if you include it in your SELECT statement.
I need to read and see if this is true is DB's other than SQL Server. By default, SQL Server does do this, but you can run the sp_tableoption to change it so that it'll store text that is under a certain size in the table and then text values that are over are assigned a pointer and stored elsewhere. I usually do this because I don't want SQL Server to store a message like "just testing" elsewhere and assign a pointer to it. Added overhead that I really don't want.
You're right, though. If each ntext field just stores a pointer for any value, then it makes no sense to move those to another table. |
<-- Eric --> |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 11 October 2002 : 15:13:45
|
In fact I was thinking about fields like MARSTATUS, LINK1, LINK2, CITY, STATE, PHOTO_URL, FIRSTNAME, LASTNAME, HOMEPAGE, which are all nvarchar and are optional. Also they are used only when you visit an user's profile. Anyway, I don't think the decision to split is all that clear... |
Snitz 3.4 Readme | Like the support? Support Snitz too |
Edited by - ruirib on 11 October 2002 15:19:52 |
 |
|
work mule
Senior Member
   
USA
1358 Posts |
Posted - 11 October 2002 : 16:10:53
|
quote: Originally posted by e3stone
I need to read and see if this is true is DB's other than SQL Server. By default, SQL Server does do this, but you can run the sp_tableoption to change it so that it'll store text that is under a certain size in the table and then text values that are over are assigned a pointer and stored elsewhere.
I don't know about mySQL. My guess is no it doesn't.
What you're referring to about the pointer or text is a very nice feature, but it looks like it's only available in SQL Server 2000.
Managing ntext, text, and image Data Using text in row In Microsoft SQL Server 2000, users can enable a text in row option on a table so it could store text, ntext, or image data in its data row.
To enable the option, execute the sp_tableoption stored procedure, specifying text in row as the option name and on as the option value. The default maximum size that can be stored in a row for a BLOB (binary large object: text, ntext, or image data) is 256 bytes, but values may range from 24 through 7000. To specify a maximum size that is not the default, specify an integer within the range as the option value.
text, ntext, or image strings are stored in the data row if the following conditions apply:
text in row is enabled.
The length of the string is shorter than the limit specified in @OptionValue
There is enough space available in the data row. When BLOB strings are stored in the data row, reading and writing the text, ntext, or image strings can be as fast as reading or writing character and binary strings. SQL Server does not have to access separate pages to read or write the BLOB string.
If a text, ntext, or image string is larger than the specified limit or the available space in the row, pointers are stored in the row instead. The conditions for storing the BLOB strings in the row still apply though: There must be enough space in the data row to hold the pointers.
For more information, see sp_tableoption.
|
Edited by - work mule on 11 October 2002 16:12:00 |
 |
|
e3stone
Average Member
  
USA
885 Posts |
Posted - 11 October 2002 : 16:30:32
|
Work Mule II -
You're right about it only being a 2000 feature. I seem to forget about the whole "backwards compatibility" issue. I need to open my eyes and look at the big picture instead of only seeing "2000....2000....2000....2000"  |
<-- Eric --> |
 |
|
Topic  |
|
|
|