Comparison of Old vs. Revised Implementation

All examples are executed with wish8.5 (patch level 15), wish8.6 (patch level 6), and wish8.7 (rev. 776371f4) vs. the revised implementation, the tool memtime (Linux) is used for measurement. All measurements are done on a 32 bit architecture. It is expected that on a 64 bit architecture the percentual increase of memory (compared to standard widget) is a bit lower. About my machine: Pentium(R) Dual-Core CPU T4400 @ 2.20GHz, memory 2 GB. I've compiled with optimization (-O2), and without debug code (-NDEBUG=1).

Example 1: Look & Feel

At first we are looking at a practical example, generated with application Scidb (with the use of command inspect). Note that this script needs the data file "Game.dat". Also note that this example will look a bit weird if the chess fonts are not installed on your system, but this example also works without these fonts.

perf-1.tcl
Game.dat
ScidbChessTraveller.ttf
ScidbSymbolTraveller.ttf

Expand/Collapse Script

Revised: 0.75 s 33,880 kb
wish8.5: 6.97 s 32,272 kb
wish8.6: 9.96 s 36,008 kb
wish8.7: 7.10 s 32,236 kb

The loading time (this also includes the computation of the line metric) is much faster in revised version. But the difference in look & feel is even more important:

  • Try to scroll down, and then scroll up, with old implementation especially up-scrolling feels to be impossible, it hangs for some seconds. The revised implementation is fast and smooth. (Note: when compiled with debug information then it may jerk a bit, because the code contains expensive assertions.)

  • Watch the mouse cursor when doing fast hovering (use the area with the long line). The old implementation has a visible delay, but the new implementation is reacting immediately (especially when compiled without assertions).

Example 2: Scrolling Some Lines

Now we test the line scrolling behavior with same data (see first example), the following script scrolls down and up.

perf-2.tcl

Expand/Collapse Script

Revised: 0.88 s 33,880 kb
wish8.5: 47.94 s 32,272 kb
wish8.6: 88.90 s 35,468 kb
wish8.7: 80.85 s 32,500 kb

It's obvious, the old implementation has severe problems with long lines. I think it's clear that the old implementation is more or less useless when long lines are involved, even the first example is demonstrating this.

Example 3: Typing Text

The user is simply typing text, 10,000 characters, short lines.

perf-3.tcl

Expand/Collapse Script

Revised: 2.03 s 26,640 kb
wish8.5: 1,61 s 25,868 kb
wish8.6: 1,98 s 37,288 kb
wish8.7: 1,79 s 25,836 kb

No significant difference, although insertion of text in revised version is more complex, especially if newlines are involved, like in this test case.

This seems to be the right place to explain one of the differences between wish8.5 and higher versions. The overhead of a function call has been increased since wish8.6, and this overhead does not belong to the text widget, it belongs to the core of the Tcl library. Due to my measurements the overhead of a simple function call is about 30% slower compared to wish8.5. Due to Tcl Performance the non-recursive execution engine imposes a measurable performance penalty; see also Performance difference tcl 8.5 vs 8.6. I think that also the fact, that wish8.5 is not using threads like wish8.6/7, might play a role here.

Example 4: Text with Marks, with Option -steadymarks

Appending character by character, and setting many marks with left gravity, when option -steadymarks is set (possible only in revised version).

perf-4.tcl

Expand/Collapse Script

Revised: 0.52 s 27,172 kb
wish8.5: 5.43 s 26,264 kb
wish8.6: 18,89 s 37,928 kb
wish8.7: 5,47 s 26,412 kb

Setting option -steadymarks has a significant impact on the speed when working with marks. In my opinion the behavior with -steadymarks enabled is more natural than the old behavior (if this option is disabled). And with computed editor layout enabling this option is a must.

Example 5: Undo and Redo

Appending character by character, then undo all changes, and then redo all undoes.

perf-5.tcl

Expand/Collapse Script

Revised: 0.48 s 27,604 kb
wish8.5: 8.02 s 39,404 kb
wish8.6: 15.34 s 39,448 kb
wish8.7: (infinite) (not measurable)

The old mechanism is very slow, the revised implementation is directly working with the segment chains, the difference in speed is enourmous. Even the memory handling is better with new implementation.

Note that performance tests with wish8.7 are not possible. The undo operations now are computing the affected ranges, and the implementation of this feature seems to have leastwise polynomial computation time.

Example 6: Append to Long Line

Appending character by character to same (long) line.

perf-6.tcl

Expand/Collapse Script

Revised: 1.90 s 26,456 kb
wish8.5: 2.20 s 25,872 kb
wish8.6: 2.31 s 28,008 kb
wish8.7: 2.24 s 25,692 kb

The old implementation needs significantly more allocations (and deallocations) than the revised implementation in this case, so finally the revised implementation is a little bit faster, although inserting a character is a bit more complex with revised implementation.

Example 7: Append Tagged Characters to Long Line

Appending character by character to same (long) line, but now every character will be tagged, in total only 10 different tags will be used.

perf-7.tcl

Expand/Collapse Script

Revised: 0.46 s 26,640 kb
wish8.5: 5.79 s 26,120 kb
wish8.6: 22.28 s 31,752 kb
wish8.7: 6.34 s 26,124 kb

The behavior of the old implementation is a catastrophe, especially the performance killer CleanupLine is responsible for this desaster, but also SplitSeg performs poorly.

Note that this example is not unrealistic: for example, the user will show some data from scientific test series, quite often this kind of data has no newlines.

Example 8: Inserting One Million Lines

Inserting 1,000,000 lines.

perf-8.tcl

Expand/Collapse Script

Revised: 3.19 s 220,820 kb
wish8.5: 2,76 s 119,432 kb
wish8.6: 3.72 s 121,416 kb
wish8.7: 3.80 s 119,448 kb

This shows that the revised implementation needs more memory for line management, and B-Tree management. But take into account that this example is a bit extreme, it is demonstrating (nearly) the worst case. And remember that the polynomial computation time (long line problem) of old implementation has been eliminated, but the memory increase is only linear.

Example 9: 'get -displaychars' with Hidden Lines

Inserting 1,000,000 lines, hiding (adding elision) most lines, and applying command get -displaychars 100 times.

perf-9.tcl

Expand/Collapse Script

Revised: 1.72 s 221,312 kb
wish8.5: 53.59 s 119,432 kb
wish8.6: 48.13 s 121,416 kb
wish8.7: 45.42 s 119,448 kb

This is one example that the handling of elided text is very fast in revised version.

Example 10: Positioning the Insert Cursor

Inserting a longer line, each character is tagged, then positioning the insert cursor 100 times.

perf-10.tcl

Expand/Collapse Script

Revised: 0.63 s 26,884 kb
wish8.5: 9.26 s 25,964 kb
wish8.6: 9.33 s 26,516 kb
wish8.7: 9.25 s 25,968 kb

The old display stuff is slow, even when simply re-drawing the insert cursor.

Example 11: Counting Display Lines

Count the number of display lines between randomly chosen text positions (with use of count -displaylines).

perf-11.tcl

Expand/Collapse Script

Revised: 2.32 s 27,172 kb
wish8.5: 23.65 s 26,252 kb
wish8.6: 24.96 s 27,324 kb
wish8.7: 26.77 s 26,268 kb

This is another example about the increase of speed in display stuff. However, the old implementation of count -displaylines is a bit of a desaster.

Example 12: Tagging Large Ranges

Adding and removing tags within large ranges.

perf-12.tcl

Expand/Collapse Script

Revised: 6.44 s 102,248 kb
wish8.5: 5.30 s 40,302 kb
wish8.6: 5.46 s 40,680 kb
wish8.7: 5.68 s 38,208 kb

As already mentioned (on page New Implementation of Tag Handling), adding and removal of tags now is more expensive, especially with large tag ranges – but the lookup of tag information is super-fast with new implementation.

This is another test case which shows one drawback of the new implementation, it is consuming more memory. Currently this is the only known drawback.

Example 13: Tagging Short Ranges

Add and delete one tag repeatedly, with the use of tag add/delete, only short ranges.

perf-13.tcl

Expand/Collapse Script

Revised: 1.32 s 26,964 kb
wish8.5: 0.68 s 25,840 kb
wish8.6: 0.76 s 25,948 kb
wish8.7: 0.76 s 25,952 kb

The revised version is about 100% slower when adding/deleting tags. The lookup of tags is super-fast now (the first two test cases are demonstrating this), but adding/deleting has slowed down a bit, the new tag handling implementation is quite more complex.

Example 14: Searching Tagged Ranges

Search forward for tagged ranges, with the use of tag nextrange.

perf-14.tcl

Expand/Collapse Script

Revised: 0.99 s 26,964 kb
wish8.5: 0.81 s 25,840 kb
wish8.6: 0.78 s 25,948 kb
wish8.7: 0.82 s 25,952 kb

No difference in time when searching tags. It's important that searching for tagged ranges is still fast, because the handling of tags has been completely changed, and the new search algorithm is more complicated than the old one (but the complexity in time is still the same).

Example 15: Re-use of Text Widget

Inserting some short lines with tagged text, we use 100 different tags (not very much). Note that this script expects a loop count as parameter. The widget will be cleared in each iteration step.

perf-15.tcl

Expand/Collapse Script

Firstly we test with loop count 1: "wish perf-15.tcl 1".

Revised: 0.40 s 26,456 kb
wish8.5: 0.33 s 25,872 kb
wish8.6: 0.38 s 27,108 kb
wish8.7: 0.34 s 25,804 kb

Secondly we test with loop count 10: "wish perf-15.tcl 10".

Revised: 0.54 s 26,596 kb
wish8.5: 1.75 s 25,840 kb
wish8.6: 2.12 s 38,140 kb
wish8.7: 1.98 s 25,836 kb

Thirdly we test with loop count 100: "wish perf-15.tcl 100".

Revised: 1.91 s 26,596 kb
wish8.5: 16.36 s 25,868 kb
wish8.6: 19.28 s 149,136 kb
wish8.7: 18.53 s 25,804 kb

This shows that the re-use of a text widget (after deleting the whole content) performs much better with revised version, the difference is enormous. The more tags, the higher the performance improvement (based on tests).