Author |
Topic |
Deleted
deleted
4116 Posts |
Posted - 28 March 2002 : 23:03:35
|
<b><font color=red>v4-Discussion-1: Reducing the String Load</font id=red></b>
<font color=green>Note: Some of the numbers given are rounded values as they differ from language to language and from version to version. Additionaly I don't have any benchmark results on how much load the current solution puts on the server.</font id=green>
In v40b03patch003 there are around 2000 strings, which make a language file more than 200KB in size. With the current method, 1,2 or more language files are included from config.asp, i.e. for every forum page hit. This number will increase as we continue to add new features.
Every language variable has a line in LangNNNN.asp (NNNN= Decimal LCID value of the locale) with the following format: <pre id=code><font face=courier size=2 id=code> strLangFILENAMEXXXXX = "translated string" '"original string" </font id=code></pre id=code> where FILENAME is the filename of the ASP file, XXXXX is a sequence number.
Currently, the strings of the selected language is executed only (2000 equates), although all (N languages * 2000 lines) are loaded through the INCLUDE.
<font color=green>These are some exact values: Number of strings = 1971 Avg. string length = 32.1 chars Number of unique strings = 1481 One line on average = 100 bytes </font id=green>
To reduce the server load, the following can be done:
<font color=red>a) We can get rid of the '"original string" part</font id=red>
<b>Pros</b>: reducing the size from 200K to 120K <b>Cons</b>: Not easy to translate (solution: we can strip them out before releasing, they do already exist in Lang1033.asp file) <b>My idea</b>: We can do this
<font color=red>b) Dividing the language files</font id=red>: We can divide the language file into subsections (Admin*.asp, "rarely used"*.asp, "mostly used"*.asp), i.e. creating the files LangAdminNNNN.asp, LangRareNNNN.asp, LangNNNN.asp. LangNNNN.asp will be included by config.asp to support main forum behaviour (reading & posting - 756 strings), LangAdmin.asp (543 strings) will be included by only those admin files, LangRareNNNN.asp (policy, register, FAQ, help files, setup, some of the moderation related pop_*.asp etc - 672 strings) will be included only by those files.
I think %80 percent of the will go in LangNNNN, %15 percent will go to LangRareNNNN, %5 will go to LangAdminNNNN.
<b>Pros</b>: With the above assumption, there will be 884 language strings executed on the average. <b>Cons</b>: Number of files to keep track of increase, must put include directives into each file, must decide which goes where. <b>My idea</b>: We can do this
<font color=red>c) Keeping track of files</font id=red>: We can include a string just at the beginning of the file like: <pre id=code><font face=courier size=2 id=code> strThisFileName = "active.asp" </font id=code></pre id=code> or better get it from the script address, and use this info to execute only the related equates (using a select case for example). Some strings from inc_*.asp files must be executed anyway.
<b>Pros</b>: The number of executed equates will well drop to 250 on the average. With this, we can support more detailed <TITLE> directives like "XXX Forums - Search Page" in any language. We can also support other MODs like Active Users. <b>Cons</b>: Harder to keep track, harder to maintain. <b>My idea</b>: We can do this
<font color=red>d) Define Global Strings (or Functions)</font id=red>: At an early stage I decided to keep each file standalone and not to define/share language variables among files. I.e. every file has its own definition for "All Forums". This will increase the flexibility in language and in program writing, but we have some performance lost here. Note that there are no duplicates in a file, i.e if the same phrase is used more than once, they use the same variable. Here are the current values: There are 46 such strings (some very similar) repeated more than 3, resulting in 322 definitions (322-46=276 duplicates).
We can define variables like <font color=green>LangGlobal00010 = "All Forums"</font id=green>.
<b>Pros</b>: Although this can be though as a good %14 reduction, it is not actually so. The average number of chars in these repeated strings are 12.8, so the savings in file size and server memory are much less (~%8). The real saving will be in the area of translation. <b>Cons</b>: It will not be easy for the programmer to keep trach of these "global" variables. You must use a "lexicon" to find what they are. <b>My idea</b>: We should not do this. But as most of these variables are repeated as a part of repeated code, such as the folder structure at the beginning of each file, error messages saying "go back to enter data" or "There Was A Problem With Your Details", link for the "admin section", entries for "username" or "password". I think it will be better to move some of these into inc_functions.asp as a whole, also reducing the repeated strings.
<font color=red>e) Use of generic strings</font id=red>: It is more user/programmer friendly, but the wording used is very detailed in most cases. For example just to go back these are used (numbers are how many times they are repeated): <BLOCKQUOTE id=quote><font size=1 face="Verdana, Arial, Helvetica" id=quote>quote:<hr height=1 noshade id=quote> Back 2 Back to Admin Forums 1 Back To Admin Home 11 Back to Archive Admin 1 Back To Forum 5 Back to Forums Administration 2 Back to Moderator Options 1 Back to previous page 1 Back To Search Page 1 Go Back 1 Go back to correct the problem. 2 Go Back To Enter Data 15 Go Back to Forum 2 Go Back to Re-Authenticate 4 Go Back To Retry 1 <hr height=1 noshade id=quote></BLOCKQUOTE id=quote></font id=quote><font face="Verdana, Arial, Helvetica" size=2 id=quote>
These can just be replaced by "Go Back", except the cases where there are two different destinations (if any). This methodology will significantly reduce the number of strings used, translated, loaded by the server, kept in the memory, ...
<b>Pros</b>: Significant reduction in string data. <b>Cons</b>: Less user/programmer/administrator friendly. <b>My idea</b>: We can carefully examine the language data to reduce the strings where appropriate.
Well, these are only my thoughts, I'm sure you can find more. Any idea will be very much appreciated <img src=icon_smile.gif border=0 align=middle>.
<font color=pink><b>Think Pink</b></font id=pink>Post v40b03 Patches |
|
Nathan
Help Moderator
USA
7664 Posts |
Posted - 28 March 2002 : 23:18:08
|
Any thoughts about using the fso on servers where it is avalible?
The idea above I like liest is b. I believe it would cause unesissary confusion for people trying to use the forum. "Now, which file do I use where?"
Nathan Bales - Romans 15:13 ---------------------------------- Snitz Exchange | Do's and Dont's< |
|
|
Deleted
deleted
4116 Posts |
Posted - 29 March 2002 : 02:48:19
|
quote:
Any thoughts about using the fso on servers where it is avalible?
Because it will reduce the size of the potential users, the decision was NOT to use it in base code. On the other hand, it is possible to provide a FSO based setup utility as a mod.
quote:
The idea above I like liest is b. I believe it would cause unesissary confusion for people trying to use the forum. "Now, which file do I use where?"
On the other hand, this will reduce the executed equates (and the file size) substantially. But the setup will not so easy because we don't have dynamic includes.
Think PinkPost v40b03 Patches< |
|
|
Nathan
Help Moderator
USA
7664 Posts |
Posted - 29 March 2002 : 02:53:11
|
How will it reduce file size if they are still going to install all the files?
Nathan Bales - Romans 15:13 ---------------------------------- Snitz Exchange | Do's and Dont's< |
|
|
HuwR
Forum Admin
United Kingdom
20584 Posts |
Posted - 29 March 2002 : 02:55:13
|
Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once per page to fetch the strings into an array.
< |
|
|
Nathan
Help Moderator
USA
7664 Posts |
Posted - 29 March 2002 : 02:59:01
|
I would think that db size would not be the issue there Huw.
The 200K per language of the strings in the database should be infinitesimal compared to the megabytes of topics and replies. In a mid sized forum.
Nathan Bales - Romans 15:13 ---------------------------------- Snitz Exchange | Do's and Dont's< |
|
|
seahorse
Senior Member
USA
1075 Posts |
Posted - 29 March 2002 : 04:48:22
|
quote:
Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once per page to fetch the strings into an array.
If this were the case, would translations be done in the DB file? What would happen if someone wanted a forum capable of more than one language?
Ken
=============== The greatest tragedy is a child without a loving parent.< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 29 March 2002 : 05:51:51
|
quote:
Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once per page to fetch the strings into an array.
If the problem is speed, wouldn't this result in a very big number of database calls just to get the strings, thus making page generation actually slower?
------------------------------------------------- Installation Guide | Do's and Dont's | MODs< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 29 March 2002 : 06:54:01
|
I understand the need for a discussion of these issues. What I fail to realize is the impact, in terms of time needed to generate the page, of this high number of equates. Is it that meaningful, that it should justify the effort of some of the options presented by Bozden?
This question made, let me take some time to address some of those options:
1. I think a) can be done. It won't be difficult to generate a script to remove the original strings, once translation is completed, thus aleviating memory load on the server.
2. I tend to like b). I remember the first time I looked at a langxxxx.asp file and saw some repeated strings, I thought the compromise between redundancy, and maintainability of the lang file and associated asp files was a very good one. I think it is probably acceptable a reduction in ease of maintenance if that will bring an increase in performance. What much that increase will be needs to be taken into account before choosing to do it. It seems to me, however, that this option is an interesting one, since maintainability won't be much difficulted if this is chosen to be implemented.
3. Regarding c). Given the number of selections needed to take into consideration each of Snitz files, will this be faster then implementing b)?
4. It doesn't look like there is much to gain from option d).
These are just my first comments. I think, as always, bozden has done a great job exploring the various options available to improve the server load due to the use of a single langXXXX.asp file. I'll try to think it over to see if I can come up with something else that could also make sense here.
------------------------------------------------- Installation Guide | Do's and Dont's | MODs< |
|
|
Deleted
deleted
4116 Posts |
Posted - 29 March 2002 : 08:09:22
|
quote:
How will it reduce file size if they are still going to install all the files?
The sum of the 3 files sizes will even be larger (because of the copyrights etc). But, admin files, help files etc will be used once in a day only. The main load will be from the common files (active, default, post, etc). That file will be loaded repeatedly on each page hit into the server memory (include directive, and will be more if you use more than one languages), and the equates are executed.
I do not have enough information on real world situations (like how much time the server needs to render a usual default.asp or a topic.asp page with 20 replies etc).
Think of this design criteria: * A web server has 100 webs with forum (a usual number) * There are 100 users online per web (=forum) (eh, we have to design for such cases) * Every second %20 of the users make the server run an active page (assumption) => There will be 100*100*0.2 = 2000 page view requests per second (the server must die:).
If the the language string related processes takes %5 of the total process (I think this is a very exagerated value), and we devide it to half by optimizing it, %2.5 server time will be saved.
As you will encounter from the Snitz forum, the server renders the html page in 2-8 seconds (multitasking), 4 sec on the average. Today, this forum has 50-100 users online and 157 at max.
I'm sure that the load the internationalization puts on the server is negligable, but anyway, we must do it best, because the future will rely on this.
Think PinkPost v40b03 Patches< |
|
|
Deleted
deleted
4116 Posts |
Posted - 29 March 2002 : 08:30:14
|
quote:
Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once per page to fetch the strings into an array.
In older days (back in 80's) where the processing time and memory are very valuable, we were optimizing our code better than today. I remember myself disassembling the Turbo Pascal compiler output to sum the processor cycles a repeated code needs and compare it to my optimized assembly code (does anyone used an assembler?). I was working on robotic vision on those days.
Today, I cannot do the same as the Internet and/or a web server is too much chotic. But in any case the following information will help:
* How much memory is used to keep a string of length N normally, how much it needs if we put into an array * How much extra time it needs to - pull M strings from the database (must be worst case) - put them into the array * How (extra) much memory the database access needs.
I'm dump on benchmarking Internet, and couldn't find any sources on these topics on the net. Probably the design criteria changed too much in 20 years. It will be very helpful if any of you can provide information on this. But, I already know the do's and dont's presented in Microsoft sites - they never use numbers. As an engineer I'm in deep need of those numbers.
One other thing is the maintainability of the language data. We need to write a web interface for it, and it will not be so easy for the translators to cope with them.
Think PinkPost v40b03 Patches< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 29 March 2002 : 10:17:20
|
My first answer to the database string storage suggestion failed to notice that only a single database call was being suggested. Obviously a single database call would make my comment meaningless.
There would be however other problems with that approach. If the strings were loaded to an array they would have to be indexed with numbers. That meant that the reference to the strings in the code would need to be a number, not a very clear reference, and it would even need to rely on a fixed position of the string in the array for the specific string being used in a given location in the code. Looks like a bit error prone to me.
The issue of putting the strings in the database wouldn't be a very serious one. Any translator could simple start from a langXXXX.asp file and, once the translation was concluded, a script would write the values to the database - something similar to what I believe you already do with langpacks, bozden. Even changes to the langXXXX.asp could be done that way, simply running the script again and replacing old values in the DB.
Another solution to the problem, albeit a bit more complicated one, with be to start with a langXXXX.asp file. Once the translation was concluded a script could be run to analise each of the source files, and import to each source file the string definitions need for the file. This would have the advantage of minimizing the number of strings needed per file, and also maintaining the ease of translation you have today with a single langXXXX.asp file. The maintainability of the files could be a bit more difficult, but that could also be taken care through a similar procedure to the one used to import the strings to all the source code files. This solution could offer the best of both worlds: ease of translation, no server overload because of non-necessary strings definitions. There would a be a price to pay, however, in the "complexity" of the script to import needed string definitions to each file.
------------------------------------------------- Installation Guide | Do's and Dont's | MODs
Edited by - ruirib on 29 March 2002 10:19:25< |
|
|
Nathan
Help Moderator
USA
7664 Posts |
Posted - 29 March 2002 : 11:30:09
|
quote: How much memory is used to keep a string of length N normally, how much it needs if we put into an array
When your talking about just including a file, then every string in that file will be loded into the memory when that page is accessed. Using a database, only the string nesisary for that page would be accessed.
quote: If the strings were loaded to an array they would have to be indexed with numbers. That meant that the reference to the strings in the code would need to be a number, not a very clear reference, and it would even need to rely on a fixed position of the string in the array for the specific string being used in a given location in the code.
Dont we use numbers to reference strings now? Could we not use the unique string ID to index the array?
Here is a potential table layout
STRING_ID (would be the same as the numeric IDs now, would be used to index array.) S_FILE_ID (we would need to come up with an id system for the files, be it numbers or names. LANG_1033 LANG_NNNN
SELECT STRING_ID, LANG_" & Lang_ID & " FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID
or somthing
Nathan Bales - Romans 15:13 ---------------------------------- Snitz Exchange | Do's and Dont's< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 29 March 2002 : 11:40:07
|
quote:
Dont we use numbers to reference strings now? Could we not use the unique string ID to index the array?
Do we use numbers? Looks like to me that we use variable names, right?
quote:
Here is a potential table layout
STRING_ID (would be the same as the numeric IDs now, would be used to index array.) S_FILE_ID (we would need to come up with an id system for the files, be it numbers or names. LANG_1033 LANG_NNNN
SELECT STRING_ID, LANG_" & Lang_ID & " FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID
My comment was not related to the table layout. If you define an array the indexes are numbered from 0 to a max. You access the array by using the index (0->max) not the contents. If the idea was to load the strings in a single DB call to an array, that array will have index values between 0 and the number of read strings minus 1. You cannot use the STRING_ID to index the array, can you?
------------------------------------------------- Installation Guide | Do's and Dont's | MODs< |
|
|
Nathan
Help Moderator
USA
7664 Posts |
Posted - 29 March 2002 : 12:08:50
|
quote: You cannot use the STRING_ID to index the array, can you?
You build the array based on the contents.
SELECT STRING_ID, LANG_" & Lang_ID & " as LANG_STRING FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID
set rs execute
wile not rs.eof arr(rs("STRING_ID")) = rs("LANG_STRING") loop
then you can use the string_Id to call the string value.
Nathan Bales - Romans 15:13 ---------------------------------- Snitz Exchange | Do's and Dont's< |
|
|
ruirib
Snitz Forums Admin
Portugal
26364 Posts |
Posted - 29 March 2002 : 12:15:02
|
Ok Nathan, I see your point.
Can't that in fact be slower than the direct assignment of string values to string vars, since ops using recordsets are slower than normal var handling? It is likely, although you probably need to measure it, that you can lose more with that solution than what you had started with in the first place...
When I read about the DB storage for strings I thought GetRows would be used, but that had the disadvantage I talked about. This one hasn't but it will be slower. ------------------------------------------------- Installation Guide | Do's and Dont's | MODs
Edited by - ruirib on 29 March 2002 12:16:43< |
|
|
Topic |
|
|
|