Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Snitz Forums 2000 DEV-Group
 DEV Internationalization (v4)
 v4-Discussion-1: Reducing the String Load
 Forum Locked
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 5

Deleted
deleted

4116 Posts

Posted - 28 March 2002 :  23:03:35  Show Profile
<b><font color=red>v4-Discussion-1: Reducing the String Load</font id=red></b>

<font color=green>Note: Some of the numbers given are rounded values as they differ from language to language and from version to version. Additionaly I don't have any benchmark results on how much load the current solution puts on the server.</font id=green>

In v40b03patch003 there are around 2000 strings, which make a language file more than 200KB in size. With the current method, 1,2 or more language files are included from config.asp, i.e. for every forum page hit. This number will increase as we continue to add new features.

Every language variable has a line in LangNNNN.asp (NNNN= Decimal LCID value of the locale) with the following format:
<pre id=code><font face=courier size=2 id=code>
strLangFILENAMEXXXXX = "translated string" '"original string"
</font id=code></pre id=code>
where FILENAME is the filename of the ASP file, XXXXX is a sequence number.

Currently, the strings of the selected language is executed only (2000 equates), although all (N languages * 2000 lines) are loaded through the INCLUDE.

<font color=green>These are some exact values:
Number of strings = 1971
Avg. string length = 32.1 chars
Number of unique strings = 1481
One line on average = 100 bytes
</font id=green>

To reduce the server load, the following can be done:

<font color=red>a) We can get rid of the '"original string" part</font id=red>

<b>Pros</b>: reducing the size from 200K to 120K
<b>Cons</b>: Not easy to translate (solution: we can strip them out before releasing, they do already exist in Lang1033.asp file)
<b>My idea</b>: We can do this

<font color=red>b) Dividing the language files</font id=red>: We can divide the language file into subsections (Admin*.asp, "rarely used"*.asp, "mostly used"*.asp), i.e. creating the files LangAdminNNNN.asp, LangRareNNNN.asp, LangNNNN.asp. LangNNNN.asp will be included by config.asp to support main forum behaviour (reading & posting - 756 strings), LangAdmin.asp (543 strings) will be included by only those admin files, LangRareNNNN.asp (policy, register, FAQ, help files, setup, some of the moderation related pop_*.asp etc - 672 strings) will be included only by those files.

I think %80 percent of the will go in LangNNNN, %15 percent will go to LangRareNNNN, %5 will go to LangAdminNNNN.

<b>Pros</b>: With the above assumption, there will be 884 language strings executed on the average.
<b>Cons</b>: Number of files to keep track of increase, must put include directives into each file, must decide which goes where.
<b>My idea</b>: We can do this

<font color=red>c) Keeping track of files</font id=red>: We can include a string just at the beginning of the file like:
<pre id=code><font face=courier size=2 id=code>
strThisFileName = "active.asp"
</font id=code></pre id=code>
or better get it from the script address, and use this info to execute only the related equates (using a select case for example). Some strings from inc_*.asp files must be executed anyway.

<b>Pros</b>: The number of executed equates will well drop to 250 on the average. With this, we can support more detailed <TITLE> directives like "XXX Forums - Search Page" in any language. We can also support other MODs like Active Users.
<b>Cons</b>: Harder to keep track, harder to maintain.
<b>My idea</b>: We can do this

<font color=red>d) Define Global Strings (or Functions)</font id=red>: At an early stage I decided to keep each file standalone and not to define/share language variables among files. I.e. every file has its own definition for "All Forums". This will increase the flexibility in language and in program writing, but we have some performance lost here. Note that there are no duplicates in a file, i.e if the same phrase is used more than once, they use the same variable. Here are the current values: There are 46 such strings (some very similar) repeated more than 3, resulting in 322 definitions (322-46=276 duplicates).

We can define variables like <font color=green>LangGlobal00010 = "All Forums"</font id=green>.

<b>Pros</b>: Although this can be though as a good %14 reduction, it is not actually so. The average number of chars in these repeated strings are 12.8, so the savings in file size and server memory are much less (~%8). The real saving will be in the area of translation.
<b>Cons</b>: It will not be easy for the programmer to keep trach of these "global" variables. You must use a "lexicon" to find what they are.
<b>My idea</b>: We should not do this. But as most of these variables are repeated as a part of repeated code, such as the folder structure at the beginning of each file, error messages saying "go back to enter data" or "There Was A Problem With Your Details", link for the "admin section", entries for "username" or "password". I think it will be better to move some of these into inc_functions.asp as a whole, also reducing the repeated strings.

<font color=red>e) Use of generic strings</font id=red>: It is more user/programmer friendly, but the wording used is very detailed in most cases. For example just to go back these are used (numbers are how many times they are repeated):
<BLOCKQUOTE id=quote><font size=1 face="Verdana, Arial, Helvetica" id=quote>quote:<hr height=1 noshade id=quote>
Back 2
Back to Admin Forums 1
Back To Admin Home 11
Back to Archive Admin 1
Back To Forum 5
Back to Forums Administration 2
Back to Moderator Options 1
Back to previous page 1
Back To Search Page 1
Go Back 1
Go back to correct the problem. 2
Go Back To Enter Data 15
Go Back to Forum 2
Go Back to Re-Authenticate 4
Go Back To Retry 1
<hr height=1 noshade id=quote></BLOCKQUOTE id=quote></font id=quote><font face="Verdana, Arial, Helvetica" size=2 id=quote>

These can just be replaced by "Go Back", except the cases where there are two different destinations (if any). This methodology will significantly reduce the number of strings used, translated, loaded by the server, kept in the memory, ...

<b>Pros</b>: Significant reduction in string data.
<b>Cons</b>: Less user/programmer/administrator friendly.
<b>My idea</b>: We can carefully examine the language data to reduce the strings where appropriate.



Well, these are only my thoughts, I'm sure you can find more. Any idea will be very much appreciated <img src=icon_smile.gif border=0 align=middle>.


<font color=pink><b>Think Pink</b></font id=pink>Post v40b03 Patches

Nathan
Help Moderator

USA
7664 Posts

Posted - 28 March 2002 :  23:18:08  Show Profile  Visit Nathan's Homepage
Any thoughts about using the fso on servers where it is avalible?

The idea above I like liest is b. I believe it would cause unesissary confusion for people trying to use the forum. "Now, which file do I use where?"

  Nathan Bales - Romans 15:13
----------------------------------

Snitz Exchange | Do's and Dont's
<
Go to Top of Page

Deleted
deleted

4116 Posts

Posted - 29 March 2002 :  02:48:19  Show Profile
quote:

Any thoughts about using the fso on servers where it is avalible?


Because it will reduce the size of the potential users, the decision was NOT to use it in base code. On the other hand, it is possible to provide a FSO based setup utility as a mod.

quote:

The idea above I like liest is b. I believe it would cause unesissary confusion for people trying to use the forum. "Now, which file do I use where?"


On the other hand, this will reduce the executed equates (and the file size) substantially. But the setup will not so easy because we don't have dynamic includes.


Think PinkPost v40b03 Patches<
Go to Top of Page

Nathan
Help Moderator

USA
7664 Posts

Posted - 29 March 2002 :  02:53:11  Show Profile  Visit Nathan's Homepage
How will it reduce file size if they are still going to install all the files?

  Nathan Bales - Romans 15:13
----------------------------------

Snitz Exchange | Do's and Dont's
<
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20580 Posts

Posted - 29 March 2002 :  02:55:13  Show Profile  Visit HuwR's Homepage
Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once
per page to fetch the strings into an array.

<
Go to Top of Page

Nathan
Help Moderator

USA
7664 Posts

Posted - 29 March 2002 :  02:59:01  Show Profile  Visit Nathan's Homepage
I would think that db size would not be the issue there Huw.

The 200K per language of the strings in the database should be infinitesimal compared to the megabytes of topics and replies. In a mid sized forum.

  Nathan Bales - Romans 15:13
----------------------------------

Snitz Exchange | Do's and Dont's
<
Go to Top of Page

seahorse
Senior Member

USA
1075 Posts

Posted - 29 March 2002 :  04:48:22  Show Profile  Visit seahorse's Homepage
quote:

Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once
per page to fetch the strings into an array.



If this were the case, would translations be done in the DB file? What would happen if someone wanted a forum capable of more than one language?


Ken

===============
The greatest tragedy is a child without a loving parent.<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 March 2002 :  05:51:51  Show Profile  Send ruirib a Yahoo! Message
quote:

Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once
per page to fetch the strings into an array.



If the problem is speed, wouldn't this result in a very big number of database calls just to get the strings, thus making page generation actually slower?

-------------------------------------------------
Installation Guide | Do's and Dont's | MODs
<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 March 2002 :  06:54:01  Show Profile  Send ruirib a Yahoo! Message
I understand the need for a discussion of these issues. What I fail to realize is the impact, in terms of time needed to generate the page, of this high number of equates. Is it that meaningful, that it should justify the effort of some of the options presented by Bozden?

This question made, let me take some time to address some of those options:

1. I think a) can be done. It won't be difficult to generate a script to remove the original strings, once translation is completed, thus aleviating memory load on the server.

2. I tend to like b). I remember the first time I looked at a langxxxx.asp file and saw some repeated strings, I thought the compromise between redundancy, and maintainability of the lang file and associated asp files was a very good one. I think it is probably acceptable a reduction in ease of maintenance if that will bring an increase in performance. What much that increase will be needs to be taken into account before choosing to do it.
It seems to me, however, that this option is an interesting one, since maintainability won't be much difficulted if this is chosen to be implemented.

3. Regarding c). Given the number of selections needed to take into consideration each of Snitz files, will this be faster then implementing b)?

4. It doesn't look like there is much to gain from option d).

These are just my first comments. I think, as always, bozden has done a great job exploring the various options available to improve the server load due to the use of a single langXXXX.asp file. I'll try to think it over to see if I can come up with something else that could also make sense here.

-------------------------------------------------
Installation Guide | Do's and Dont's | MODs
<
Go to Top of Page

Deleted
deleted

4116 Posts

Posted - 29 March 2002 :  08:09:22  Show Profile
quote:

How will it reduce file size if they are still going to install all the files?



The sum of the 3 files sizes will even be larger (because of the copyrights etc). But, admin files, help files etc will be used once in a day only. The main load will be from the common files (active, default, post, etc). That file will be loaded repeatedly on each page hit into the server memory (include directive, and will be more if you use more than one languages), and the equates are executed.

I do not have enough information on real world situations (like how much time the server needs to render a usual default.asp or a topic.asp page with 20 replies etc).

Think of this design criteria:
* A web server has 100 webs with forum (a usual number)
* There are 100 users online per web (=forum) (eh, we have to design for such cases)
* Every second %20 of the users make the server run an active page (assumption)
=> There will be 100*100*0.2 = 2000 page view requests per second (the server must die:).

If the the language string related processes takes %5 of the total process (I think this is a very exagerated value), and we devide it to half by optimizing it, %2.5 server time will be saved.

As you will encounter from the Snitz forum, the server renders the html page in 2-8 seconds (multitasking), 4 sec on the average. Today, this forum has 50-100 users online and 157 at max.

I'm sure that the load the internationalization puts on the server is negligable, but anyway, we must do it best, because the future will rely on this.


Think PinkPost v40b03 Patches<
Go to Top of Page

Deleted
deleted

4116 Posts

Posted - 29 March 2002 :  08:30:14  Show Profile
quote:

Woud it not make more sense to use a database to store the string data, or would this make the database size the issue. it would not need to increase hits to the db much, as you just need to do it once
per page to fetch the strings into an array.



In older days (back in 80's) where the processing time and memory are very valuable, we were optimizing our code better than today. I remember myself disassembling the Turbo Pascal compiler output to sum the processor cycles a repeated code needs and compare it to my optimized assembly code (does anyone used an assembler?). I was working on robotic vision on those days.

Today, I cannot do the same as the Internet and/or a web server is too much chotic. But in any case the following information will help:

* How much memory is used to keep a string of length N normally, how much it needs if we put into an array
* How much extra time it needs to
- pull M strings from the database (must be worst case)
- put them into the array
* How (extra) much memory the database access needs.

I'm dump on benchmarking Internet, and couldn't find any sources on these topics on the net. Probably the design criteria changed too much in 20 years. It will be very helpful if any of you can provide information on this. But, I already know the do's and dont's presented in Microsoft sites - they never use numbers. As an engineer I'm in deep need of those numbers.

One other thing is the maintainability of the language data. We need to write a web interface for it, and it will not be so easy for the translators to cope with them.


Think PinkPost v40b03 Patches<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 March 2002 :  10:17:20  Show Profile  Send ruirib a Yahoo! Message
My first answer to the database string storage suggestion failed to notice that only a single database call was being suggested. Obviously a single database call would make my comment meaningless.

There would be however other problems with that approach. If the strings were loaded to an array they would have to be indexed with numbers. That meant that the reference to the strings in the code would need to be a number, not a very clear reference, and it would even need to rely on a fixed position of the string in the array for the specific string being used in a given location in the code. Looks like a bit error prone to me.

The issue of putting the strings in the database wouldn't be a very serious one. Any translator could simple start from a langXXXX.asp file and, once the translation was concluded, a script would write the values to the database - something similar to what I believe you already do with langpacks, bozden. Even changes to the langXXXX.asp could be done that way, simply running the script again and replacing old values in the DB.

Another solution to the problem, albeit a bit more complicated one, with be to start with a langXXXX.asp file. Once the translation was concluded a script could be run to analise each of the source files, and import to each source file the string definitions need for the file. This would have the advantage of minimizing the number of strings needed per file, and also maintaining the ease of translation you have today with a single langXXXX.asp file. The maintainability of the files could be a bit more difficult, but that could also be taken care through a similar procedure to the one used to import the strings to all the source code files.
This solution could offer the best of both worlds: ease of translation, no server overload because of non-necessary strings definitions. There would a be a price to pay, however, in the "complexity" of the script to import needed string definitions to each file.

-------------------------------------------------
Installation Guide | Do's and Dont's | MODs


Edited by - ruirib on 29 March 2002 10:19:25<
Go to Top of Page

Nathan
Help Moderator

USA
7664 Posts

Posted - 29 March 2002 :  11:30:09  Show Profile  Visit Nathan's Homepage
quote:
How much memory is used to keep a string of length N normally, how much it needs if we put into an array


When your talking about just including a file, then every string in that file will be loded into the memory when that page is accessed. Using a database, only the string nesisary for that page would be accessed.

quote:
If the strings were loaded to an array they would have to be indexed with numbers. That meant that the reference to the strings in the code would need to be a number, not a very clear reference, and it would even need to rely on a fixed position of the string in the array for the specific string being used in a given location in the code.


Dont we use numbers to reference strings now? Could we not use the unique string ID to index the array?

Here is a potential table layout

STRING_ID (would be the same as the numeric IDs now, would be used to index array.)
S_FILE_ID (we would need to come up with an id system for the files, be it numbers or names.
LANG_1033
LANG_NNNN

SELECT STRING_ID, LANG_" & Lang_ID & " FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID

or somthing

  Nathan Bales - Romans 15:13
----------------------------------

Snitz Exchange | Do's and Dont's
<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 March 2002 :  11:40:07  Show Profile  Send ruirib a Yahoo! Message
quote:

Dont we use numbers to reference strings now? Could we not use the unique string ID to index the array?



Do we use numbers? Looks like to me that we use variable names, right?

quote:


Here is a potential table layout

STRING_ID (would be the same as the numeric IDs now, would be used to index array.)
S_FILE_ID (we would need to come up with an id system for the files, be it numbers or names.
LANG_1033
LANG_NNNN

SELECT STRING_ID, LANG_" & Lang_ID & " FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID



My comment was not related to the table layout.
If you define an array the indexes are numbered from 0 to a max. You access the array by using the index (0->max) not the contents.
If the idea was to load the strings in a single DB call to an array, that array will have index values between 0 and the number of read strings minus 1. You cannot use the STRING_ID to index the array, can you?

-------------------------------------------------
Installation Guide | Do's and Dont's | MODs
<
Go to Top of Page

Nathan
Help Moderator

USA
7664 Posts

Posted - 29 March 2002 :  12:08:50  Show Profile  Visit Nathan's Homepage
quote:
You cannot use the STRING_ID to index the array, can you?


You build the array based on the contents.

SELECT STRING_ID, LANG_" & Lang_ID & " as LANG_STRING FROM FORUM_LANG WHERE FILE_ID = " & FILE_ID


set rs execute

wile not rs.eof
arr(rs("STRING_ID")) = rs("LANG_STRING")
loop

then you can use the string_Id to call the string value.

  Nathan Bales - Romans 15:13
----------------------------------

Snitz Exchange | Do's and Dont's
<
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 March 2002 :  12:15:02  Show Profile  Send ruirib a Yahoo! Message
Ok Nathan, I see your point.

Can't that in fact be slower than the direct assignment of string values to string vars, since ops using recordsets are slower than normal var handling? It is likely, although you probably need to measure it, that you can lose more with that solution than what you had started with in the first place...

When I read about the DB storage for strings I thought GetRows would be used, but that had the disadvantage I talked about. This one hasn't but it will be slower.
-------------------------------------------------
Installation Guide | Do's and Dont's | MODs


Edited by - ruirib on 29 March 2002 12:16:43<
Go to Top of Page
Page: of 5 Previous Topic Topic Next Topic  
Next Page
 Forum Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.18 seconds. Powered By: Snitz Forums 2000 Version 3.4.07