All examples are executed with wish8.5 (patch level 15), wish8.6 (patch level 6), and wish8.7 (rev. 776371f4) vs. the revised implementation, the tool memtime (Linux) is used for measurement. All measurements are done on a 32 bit architecture. It is expected that on a 64 bit architecture the percentual increase of memory (compared to standard widget) is a bit lower. About my machine: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz, memory 2 GB. I've compiled with optimization (-O2), and without debug code (-NDEBUG=1).
At first we are looking at a practical example, generated with application Scidb (with the use of command inspect). Note that this script needs the data file "Game.dat". Also note that this example will look a bit weird if the chess fonts are not installed on your system, but this example also works without these fonts.
perf-1.tcl | |
Game.dat | |
ScidbChessTraveller.ttf | |
ScidbSymbolTraveller.ttf |
Revised: | 0.75 s | 33,880 kb | |
wish8.5: | 6.97 s | 32,272 kb | |
wish8.6: | 9.96 s | 36,008 kb | |
wish8.7: | 7.10 s | 32,236 kb |
The loading time (this also includes the computation of the line metric) is much faster in revised version. But the difference in look & feel is even more important:
Try to scroll down, and then scroll up, with old implementation especially up-scrolling feels to be impossible, it hangs for some seconds. The revised implementation is fast and smooth. (Note: when compiled with debug information then it may jerk a bit, because the code contains expensive assertions.)
Watch the mouse cursor when doing fast hovering (use the area with the long line). The old implementation has a visible delay, but the new implementation is reacting immediately (especially when compiled without assertions).
Now we test the line scrolling behavior with same data (see first example), the following script scrolls down and up.
perf-2.tcl |
Revised: | 0.88 s | 33,880 kb | |
wish8.5: | 47.94 s | 32,272 kb | |
wish8.6: | 88.90 s | 35,468 kb | |
wish8.7: | 80.85 s | 32,500 kb |
It's obvious, the old implementation has severe problems with long lines. I think it's clear that the old implementation is more or less useless when long lines are involved, even the first example is demonstrating this.
The user is simply typing text, 10,000 characters, short lines.
perf-3.tcl |
Revised: | 2.03 s | 26,640 kb | |
wish8.5: | 1,61 s | 25,868 kb | |
wish8.6: | 1,98 s | 37,288 kb | |
wish8.7: | 1,79 s | 25,836 kb |
No significant difference, although insertion of text in revised version is more complex, especially if newlines are involved, like in this test case.
This seems to be the right place to explain one of the differences between wish8.5 and higher versions. The overhead of a function call has been increased since wish8.6, and this overhead does not belong to the text widget, it belongs to the core of the Tcl library. Due to my measurements the overhead of a simple function call is about 30% slower compared to wish8.5. Due to Tcl Performance the non-recursive execution engine imposes a measurable performance penalty; see also Performance difference tcl 8.5 vs 8.6. I think that also the fact, that wish8.5 is not using threads like wish8.6/7, might play a role here.
Appending character by character, and setting many marks with left gravity, when option -steadymarks is set (possible only in revised version).
perf-4.tcl |
Revised: | 0.52 s | 27,172 kb | |
wish8.5: | 5.43 s | 26,264 kb | |
wish8.6: | 18,89 s | 37,928 kb | |
wish8.7: | 5,47 s | 26,412 kb |
Setting option -steadymarks has a significant impact on the speed when working with marks. In my opinion the behavior with -steadymarks enabled is more natural than the old behavior (if this option is disabled). And with computed editor layout enabling this option is a must.
Appending character by character, then undo all changes, and then redo all undoes.
perf-5.tcl |
Revised: | 0.48 s | 27,604 kb | |
wish8.5: | 8.02 s | 39,404 kb | |
wish8.6: | 15.34 s | 39,448 kb | |
wish8.7: | (infinite) | (not measurable) |
The old mechanism is very slow, the revised implementation is directly working with the segment chains, the difference in speed is enourmous. Even the memory handling is better with new implementation.
Note that performance tests with wish8.7 are not possible. The undo operations now are computing the affected ranges, and the implementation of this feature seems to have leastwise polynomial computation time.
Appending character by character to same (long) line.
perf-6.tcl |
Revised: | 1.90 s | 26,456 kb | |
wish8.5: | 2.20 s | 25,872 kb | |
wish8.6: | 2.31 s | 28,008 kb | |
wish8.7: | 2.24 s | 25,692 kb |
The old implementation needs significantly more allocations (and deallocations) than the revised implementation in this case, so finally the revised implementation is a little bit faster, although inserting a character is a bit more complex with revised implementation.
Appending character by character to same (long) line, but now every character will be tagged, in total only 10 different tags will be used.
perf-7.tcl |
Revised: | 0.46 s | 26,640 kb | |
wish8.5: | 5.79 s | 26,120 kb | |
wish8.6: | 22.28 s | 31,752 kb | |
wish8.7: | 6.34 s | 26,124 kb |
The behavior of the old implementation is a catastrophe, especially the performance killer CleanupLine is responsible for this desaster, but also SplitSeg performs poorly.
Note that this example is not unrealistic: for example, the user will show some data from scientific test series, quite often this kind of data has no newlines.
Inserting 1,000,000 lines.
perf-8.tcl |
Revised: | 3.19 s | 220,820 kb | |
wish8.5: | 2,76 s | 119,432 kb | |
wish8.6: | 3.72 s | 121,416 kb | |
wish8.7: | 3.80 s | 119,448 kb |
This shows that the revised implementation needs more memory for line management, and B-Tree management. But take into account that this example is a bit extreme, it is demonstrating (nearly) the worst case. And remember that the polynomial computation time (long line problem) of old implementation has been eliminated, but the memory increase is only linear.
Inserting 1,000,000 lines, hiding (adding elision) most lines, and applying command get -displaychars 100 times.
perf-9.tcl |
Revised: | 1.72 s | 221,312 kb | |
wish8.5: | 53.59 s | 119,432 kb | |
wish8.6: | 48.13 s | 121,416 kb | |
wish8.7: | 45.42 s | 119,448 kb |
This is one example that the handling of elided text is very fast in revised version.
Inserting a longer line, each character is tagged, then positioning the insert cursor 100 times.
perf-10.tcl |
Revised: | 0.63 s | 26,884 kb | |
wish8.5: | 9.26 s | 25,964 kb | |
wish8.6: | 9.33 s | 26,516 kb | |
wish8.7: | 9.25 s | 25,968 kb |
The old display stuff is slow, even when simply re-drawing the insert cursor.
Count the number of display lines between randomly chosen text positions (with use of count -displaylines).
perf-11.tcl |
Revised: | 2.32 s | 27,172 kb | |
wish8.5: | 23.65 s | 26,252 kb | |
wish8.6: | 24.96 s | 27,324 kb | |
wish8.7: | 26.77 s | 26,268 kb |
This is another example about the increase of speed in display stuff. However, the old implementation of count -displaylines is a bit of a desaster.
Adding and removing tags within large ranges.
perf-12.tcl |
Revised: | 6.44 s | 102,248 kb | |
wish8.5: | 5.30 s | 40,302 kb | |
wish8.6: | 5.46 s | 40,680 kb | |
wish8.7: | 5.68 s | 38,208 kb |
As already mentioned (on page New Implementation of Tag Handling), adding and removal of tags now is more expensive, especially with large tag ranges – but the lookup of tag information is super-fast with new implementation.
This is another test case which shows one drawback of the new implementation, it is consuming more memory. Currently this is the only known drawback.
Add and delete one tag repeatedly, with the use of tag add/delete, only short ranges.
perf-13.tcl |
Revised: | 1.32 s | 26,964 kb | |
wish8.5: | 0.68 s | 25,840 kb | |
wish8.6: | 0.76 s | 25,948 kb | |
wish8.7: | 0.76 s | 25,952 kb |
The revised version is about 100% slower when adding/deleting tags. The lookup of tags is super-fast now (the first two test cases are demonstrating this), but adding/deleting has slowed down a bit, the new tag handling implementation is quite more complex.
Search forward for tagged ranges, with the use of tag nextrange.
perf-14.tcl |
Revised: | 0.99 s | 26,964 kb | |
wish8.5: | 0.81 s | 25,840 kb | |
wish8.6: | 0.78 s | 25,948 kb | |
wish8.7: | 0.82 s | 25,952 kb |
No difference in time when searching tags. It's important that searching for tagged ranges is still fast, because the handling of tags has been completely changed, and the new search algorithm is more complicated than the old one (but the complexity in time is still the same).
Inserting some short lines with tagged text, we use 100 different tags (not very much). Note that this script expects a loop count as parameter. The widget will be cleared in each iteration step.
perf-15.tcl |
Firstly we test with loop count 1: "wish perf-15.tcl 1".
Revised: | 0.40 s | 26,456 kb | |
wish8.5: | 0.33 s | 25,872 kb | |
wish8.6: | 0.38 s | 27,108 kb | |
wish8.7: | 0.34 s | 25,804 kb |
Secondly we test with loop count 10: "wish perf-15.tcl 10".
Revised: | 0.54 s | 26,596 kb | |
wish8.5: | 1.75 s | 25,840 kb | |
wish8.6: | 2.12 s | 38,140 kb | |
wish8.7: | 1.98 s | 25,836 kb |
Thirdly we test with loop count 100: "wish perf-15.tcl 100".
Revised: | 1.91 s | 26,596 kb | |
wish8.5: | 16.36 s | 25,868 kb | |
wish8.6: | 19.28 s | 149,136 kb | |
wish8.7: | 18.53 s | 25,804 kb |
This shows that the re-use of a text widget (after deleting the whole content) performs much better with revised version, the difference is enormous. The more tags, the higher the performance improvement (based on tests).