Miva, Miva Script, Miva Empresa, Miva Mia amd Miva Merchant are registered trademarks of the Miva Corporation
 
Ivo Truxa - truXoft control systems: advanced programming and custom IT solutions home / about / webdesign / Miva / automation / contact

http://mivo.truxoft.com
MIVO!
miva beyond limits

 

MIVA®  RESOURCES:  Databases Performance Notes

by Ivo Truxa, 01/08/2001

Some aspects of database performance were discussed also on the Miva User List. Below, you may find copy of the correspondence.

  1. John Tuach's "Indexing Efficiency" query
  2. Brief explanation of optimizing Miva indexes
  3. Second John's query
  4. Detailed explanation of the Miva database performance background
  5. Example of an advanced application of MvFILTER in a real script
  6. User Comments


From: John Tuach
Date: Sun, 7 Jan 2001 19:52:35 +0000
Subject: [meu] Indexing Efficiency

I have just read Ivo's treatise on the correlation between index file size and processing speed as I am looking for ways of introducing efficiencies into a new application which will require a database of up to 200,000 records.

Given that the smaller the file size the faster the processing, what is the most efficient key type to use to reference this number of records?

I would normally use a sequential numeric key but thought I would ask anyway. The long integer required uses 4 bytes to store the key value but would there be a significant efficiency gain in using a 3 character record key instead?

Regards

John Tuach
Website: www.uk-golf.com


top


From: Ivo Truxa
Date: Sun, 7 Jan 2001 22:07:33 +0100
Subject: RE: [meu] Indexing Efficiency

John,

I am not sure if I wrote it clear enough on my website, but changing of the key length (I mean changing it in the database definition) is not enough for reducing the index size. You have to manipulate the index in a hex editor and to be sure to avoid MvREINDEX (or applying the hack afterwards again). Be sure to read also the paragraph above the one about reducing the index size:

http://truxoft.com/miva/art0027.htm#long

The best way to find out if reducing the index size is worth of the hassle with hacking the file, is to test it on a working database on the real website. There will be for sure a difference of search performance, but it is a question if it is of any interest, in your case. If for example the search normally takes 8 sec and with small index just 4 sec, but it takes 20 sec to parse the rest of the script, download the page, the graphics and displaying the results, then I am not sure if it is what you are looking for.

BTW: the same as I wrote about the index file size is valid for the database size as well. Often, it can help a lot if you split your database into two parts - a small master database with just the most used fields and an additional one with complementary information (that you do not use always). If the system is able to keep big part of the database in the cache (or in file access buffers), it can skip without any new disk access to the next record(s) and speed up the application considerably. Especially such operations like MvFILTER and MvSKIP would extremely profit from the reduced database size.

Regular resorting of the database physically with the most used index is another excellent measure for improving the database performance.

(see: http://truxoft.com/miva/showcode.htm?doc=resort.txt)

Ivo


top


From: John Tuach
Date: Mon, 8 Jan 2001 11:11:24 +0000
Subject: RE: [meu] Indexing Efficiency

Ivo...

What I was trying to establish were some ground rules as regards the optimum type of index to use. I appreciate that other factors will determine overall efficiency but I felt that if I knew and used the best type of index then other issues such as database structure could be treated separately and tuned thereafter.

To try and provide myself with some 'empirical' data, I created a database of 100,000 records where the record format was as follows;

Key1 3 Characters(Base 62 - values equivalent to 1 to 100000)
key2 11 Characters(sequential based on dyn_time_t padded to 11)
Key3Long Integer(1 to 100000)

I then created index files based on each field and used each in turn to perform 3 MvFIND reads looking for records 100000, 1 and 50000 inside of a 5000 loop.

At the end of that process I seem to have proved nothing!

The database size was 2.5Mb and each of the index files were round about 25Mb. The time taken by my 400Mz W98 PC to perform the 15,000 retrievals using each index type was 29 seconds apart from the Base62 index which was marginally faster.

So the moral seems to be that the choice of index type and its length makes little or no difference.

Regards
John Tuach


top


From: Ivo Truxa
Date: Mon, 8 Jan 2001 15:39:07 +0100
Subject: RE: [meu] Indexing Efficiency

John,

Let me explain it again. I hope I express myself clearer this time:

If you want to improve the performance of your indexes, it is NOT enough to change the database field size. You must edit the index header and you have to do it on an empty index and afterwards adding all the database records into the index. Only in this way you get smaller and correct indexes that will speed up your database. You should be able to reduce the INDEX size (not only the database size) to some 1MB.

Reducing the database size helps to improve the performance too. Although it have no big influence on the single access functions (MvGO, MvFIND), it has very great effect on sequential database accesses - this is especially the case at any database operations with active FILTER and at MvSKIP commands or sequential accessing of database records in MvWHILE.

In contrary to the effect of the DATABASE size, that is important especially for sequential reading of multiple records, the change of the INDEX size accelerates also single access commands (MvFIND, MvGO).

The main reason of the performance boost is mechanical:

Processor instructions take typically time in the range of nanoseconds, where reading a disc sector (typically 512 bytes) can be measured in microseconds (or tens of them) - about 1000-10,000 slower. A random access to another disk sector is even worse - the hardisk has to re-position the heads and let the disk turn till the desired sector appears under the heads - it takes even much more time. Typically a random access takes about 10 miliseconds - again 1000 times more time than a plain reading of the record from the disk or almost 1,000,000 more than accessing it from fast RAM or cache.

This is the reason modern drives and operating systems buffer more data than requested into the cache. The data is then available for immediate access. Sequential reading even of a huge part of a database (or index) is therefore much quicker than random accesses (supposing defragmented disk).

It is evident that the bigger your files are, the higher is the probability and frequency of disk accesses with the necessity to move the heads and therefore much higher latency.

So, if we want to apply this theory to Miva database performance then we have to speak about both files - the database AND the index. Opening the database files is rather important and takes a lot of time - each file means typically one, but more probably several random disk accesses - so typically the database opening could take hundred(s) milliseconds. The more indexes you are opening, the longer it takes.

When you open the index, as first Miva reads the header at the beginning of the file and then it goes to the root of the binary tree that is approximately in the middle of the file. If your index is 20Megs big, there is no chance that it could be done from the cache or in one disk cylinder turn. At an index of 1Meg (it means 512kB to the index root), the probability to get to the index root without another disk access is still not too high too (depends on the hardware, OS and other things), but the subsequent searching through the index would profit from the reduced size a lot.

The index is split into blocks (index nodes) and during each database access Miva has to read minimally one, but mostly several such blocks. Due to the balance tree structure, the subsequent blocks are often close to each other and therefore may be read much quicker. Again - the chance to read all blocks during a single search from the cache or without repositioning the heads is much higher at tiny key size that at the default one. The more records there are in the database, the bigger performance gain you will see.

Miva steps through the index at any database access command (even at MvGO!!), so the reduced index size will show well in all functions.

Reducing the database itself helps especially at subsequent reading of several records. A random access to database records would not be faster (I am ignoring now the index performance gain), because typically each record means a new disk access. However, as soon as you read several neighbor records, you will get better results with smaller records. If you maintain your database well and keep it physically sorted by the most used index, the performance gain would be really significant. Remember that especially commands like MvSKIP (and especially with higher ROW values) and any database operation after applying MvFILTER, access database in this way - reading sequentially multiple records.

Behind the size, using of MEMO fields is another important factor for the database performance. MEMO fields are stored unsorted in another file and typically each record access means that the disk has to move heads again to get the value. It may be true even at neighbor records of a well-maintained database. I firmly believe (but did not tested yet) that Miva reads the MEMO values from the files regardless if you are going to use them or not and therefore, even skipping or filtering such database means longer times. That's why I was speaking about splitting databases into several parts with keys important for indexing, searching and filtering apart from complementary information (possibly in MEMO fields).

In multi-user and multitasking environment of web servers, the effect of the performance gain should be even more significant.

Ivo

PS: If you measure performance in a MvWHILE loop, do not forget to deduct the empty loop time from the total, to get proper results.


top


From: Ivo Truxa
Date: Thu, 11 Jan 2001 22:53:38 +0100
Subject: [meu] MvFILTER vs MvFIND

Lately I posted several times advocating of MvFILTER, but I am afraid I still did not explain it well. Even the description on my website misses a concrete example. So let's show the correct but undocumented use of MvFILTER.

As a model I took Don's script for picking topics from his picture database. He just posted it in another related thread (Indexing Tokens). My translation does nothing more than his original one, but it uses MvFILTER to make it quicker and also in using less code.

Let's explain the sorting and looping with MvFILTER in details, now. First I show the complete script and below I analyze it with comments and explanations:

Note: I use to write long lines of code and the may get broken on the way

THE COMPLETE SCRIPT:

<MvASSIGN NAME="l.chr" VALUE="A">
<MvWHILE EXPR="{l.chr LE 'Z'}">
  <MvEVAL EXPR="{'<a href="http://miva.truxoft.com/art0036.htm?letter=' $ l.chr $ '">' $ l.chr}"></a> |
  <MvASSIGN NAME="l.chr" VALUE="{asciichar(asciivalue(l.chr)+1)}">
</MvWHILE>

<MvIF EXPR="{g.letter}">
 <h2>Index: <MvEVAL EXPR="{g.letter}"></h2>
 <ul>
  <MvOPEN NAME="db" DATABASE="{database}" INDEXES="{topic_index}">
  <MvASSIGN NAME="g.lastTopic" VALUE="">
  <MvASSIGN NAME="g.stop" VALUE="1">
  <MvFILTER NAME="db" FILTER="{g.stop OR topic(db.d.recno,db.d.name,db.d.topic) 
                                         OR (g.letter CIN db.d.name) NE 1}">
  <MvFIND NAME="db" VALUE="{g.letter}">
  <MvASSIGN NAME="g.stop" VALUE="0">
  <MvSKIP NAME="db">
  <MvCLOSE>
 </ul>
</MvIF>

<MvFUNCTION NAME="topic" PARAMETERS="rec,name,topic">
 <MvIF EXPR="{NOT g.stop AND tolower(g.lastTopic) NE tolower(l.topic)}">
  <li><a href="script.mv?arg=&[l.rec];"><MvEVAL EXPR="{l.name $ ': ' $ l.topic}"></a></li>
  <MvASSIGN NAME="g.lastTopic" VALUE="{l.topic}">
 </MvIF>
</MvFUNCTION>
THE EXPLANATION:

The following part only prints the alphabetic navigation bar and has nothing to do with the MvFILTER algorithm:

<MvASSIGN NAME="l.chr" VALUE="A">
<MvWHILE EXPR="{l.chr LE 'Z'}">
  <MvEVAL EXPR="{'<a href="http://miva.truxoft.com/art0036.htm?letter=' $ l.chr $ '">' $ l.chr}"></a> |
  <MvASSIGN NAME="l.chr" VALUE="{asciichar(asciivalue(l.chr)+1)}">
</MvWHILE>

Opening the database with the desired database index:

<MvIF EXPR="{g.letter}">
 <h2>Index: <MvEVAL EXPR="{g.letter}"></h2>
 <ul>
  <MvOPEN NAME="db" DATABASE="{database}" INDEXES="{topic_index}">

'lastTopic' variable is use to avoid displaying the same topic multiple times (it was not in the original script):

  <MvASSIGN NAME="g.lastTopic" VALUE="">

VERY IMPORTANT: the 'stop' variable avoids Miva skipping through the database when processing the MvFILTER tag:

  <MvASSIGN NAME="g.stop" VALUE="1">

Explanation: When you apply FILTER, Miva skips through the database as long as there is no match (the filter expression returns FALSE). If you force the FILTER expression to return TRUE, Miva stops immediately without changing the recno.

Now, we apply the desired filter with the 'stop' variable equal 1 - effectively stopping the filtering immediately:

  <MvFILTER NAME="db" FILTER="{g.stop OR topic(db.d.recno,db.d.name,db.d.topic) 
                                    OR (g.letter CIN db.d.name) NE 1}">

Now, we use the advantage of the very fast MvFIND to access the first matching record

  <MvFIND NAME="db" VALUE="{g.letter}">

Setting the 'stop' variable to FALSE allows the actual filter expression to work:

  <MvASSIGN NAME="g.stop" VALUE="0">

And with the skip command we let Miva apply the filter expression onto the database. As you can see in the filter expression, Miva will loop through the database as long as the first letter of the topic field is equal to the selected letter (the variable 'letter') and in the same time it will process the topic() function (see explanation below for this trick)

  <MvSKIP NAME="db">

Ending the script:

  <MvCLOSE>
 </ul>
</MvIF>

This is another tricky (and absolutely not obligatory) use of MvFILTER - you can use it not only to filter database records, but also to perform very quick loops through the database. As long as the function returns nothing, zero or FALSE, Miva will go on looping the database and executing your function.

<MvFUNCTION NAME="topic" PARAMETERS="rec,name,topic">

We display the topic only if the 'stop' variable was already cleared to allow filtering and only if the current topic was not displayed yet:

 <MvIF EXPR="{NOT g.stop AND tolower(g.lastTopic) NE tolower(l.topic)}">

This is the topic linked to its URL:

  <li><a href="script.mv?arg=&[l.rec];"><MvEVAL EXPR="{l.name $ ': ' $ l.topic}"></a></li>

We have to uptade the 'lastTopic' variable to avoid displaying the topic again:

  <MvASSIGN NAME="g.lastTopic" VALUE="{l.topic}">
 </MvIF>
</MvFUNCTION>

NOTE: for some reasons, the database variable names are not correctly interpreted if used inside of the function - that's why I am passing them through the arguments. Another limitation is that in this way, MvFILTER seems to work only directly in the body of a Miva script. If you try to use MvFILTER that uses a non-Miva-native function as FILTER expression, inside of another function, it won't work correctly.

I hope this concrete example helps people to speed up their search algorithms,


top

Some Useful Links

Speeding up MvFILTER
Miva Index Secrets
Miva Databases Surprises
Miva Script Manual: Using Databases
What is X-Base


top

   

Miva and some other terms used on this page are registerd trademarks of the Miva Corporation
copyright  truXoft  © 1997-2008