Monday, March 18, 2013

Simple QML vs EFL comparison


Recently I found this blog post about complete minesweeper clone - elemines - based on Enlightenment Foundation Libraries. As EFL are designed to efficiently work even on PDAs, I came up with an idea to implement similar clone in plain QML/Javascript (QmlMiner) and perform simple comparative analysis. I wondered how the QML version would compare with the EFL one.
Following areas were analyzed:
  1. Developer experience
  2. Source code size and used languages
  3. Memory consumption
  4. Startup time
The comparison was concluded with a limited performance check.
You can look at it from many angles. Just note that I was comparing virtual-machine-based  runtime (Qt4/QML - QtQuick 1.1, JavaScript) with EFL app that is coded in C and compiled into a native binary to see how much advantage the low-level C programming has over more modern technology such as QML.


1. Developer experience


While creating the QmlMiner, I copied as much appearance and functionality as I could from elemines but decided not to look at elemines code.

This is how both apps look like:

QmlMiner


elemines
elemines


Surprisingly, the QML implementation didn't take much time:
Activity Hours Comment
Development time spent on JavaScript code
8
I have never written any minesweeper engine
Development time spent on QML code
12
Dialogs, button, animations(explosions) etc.

I should also mention my experience with related technologies:
  1. Intermediate Qt knowledge (3 years)
  2. A few months of QML development
  3. Basic Javascript knowledge (only used with QML code)
Plain Qt knowledge was not a requirement in this case because the QmlMiner contains no C/C++ code at all. Understanding QML and simple JS was just enough. The QmlMiner can be executed with the qmlviewer tool but afterwards I added simple main.cpp for memory and startup performance tests to make "lightness" of the QML app possibly similar to elemines. Basically, qmlviewer has many features that are not needed for the task.
After creating the QmlMiner I reviewed some source code of elemines to spot similarities and noticed that both applications have corresponding game engines and default.edc (*.edc) file has syntax similar to JSON. It has programs section which I believe is somewhat similar to QML states.
Another observation is that  the edc file is a resource of data used in C code. The C code creates objects, imperatively defines interactions between them and lays out the UI. Please note these are only my guesses - guesses of developer neither much experienced in plain C nor in EFL.


2. Source code size and used languages


I measured size of source code with the wc command and SLOCCount tool. SLOCCount was used to count lines of elemines C code and QmlMiner's C++ code (it skipped comments). wc (with -l option) was used to count lines of QML files and eliemines's deafult.edc file. The QML file didn't have any comments and the edc file had only 20, which I excluded. I assumed that the whole QmlMinerModel.qml file contains JS code - it is the ,,game engine''. Other files describe the GUI (look and behavior). I didn't analysed the build system files (Makefiles in elemines and a pro file in QmlMiner) because they are not very relevant.

The results are as follows:

lines of code
purpose
QmlMiner
EFL elemines
Business logic jacascript
137
c
572
UI and behavior qml
518
edc
915
boostrap c++
13
Total
668
1487

3. Memory consumption


Definitions

Before presenting memory consumption and startup statistics, I would like to explain some terms:
  • QML App (with compiled-in resources) - an application compiled with all needed resource files compiled into the binary as Qt resources
  • QML App - a standard QmlMiner application. All external resources are kept outside of the application binary 
  • EFL App - the original elemines 0.1 application 
  • i5 32bit - test machine - Intel i5, 4GB Ram, HDD, Ubuntu 12.10 32bit, Qt 4.8.4, EFL 1.7.4, kernel 3.5.0-25-generic, IceWM 1.3.7
  • i7 32bit - test machine - Intel i7, 8GB Ram, SSD, Ubuntu 12.10 32bit, Qt 4.8.4, EFL 1.7.4, kernel 3.5.0-26-generic, IceWM 1.3.7,
  • i5 64bit - test machine - Intel i5, 4GB Ram, HDD, openSuse 12.2 64bit, Qt 4.8.4, EFL 1.7.5, kernel 3.4.28-2.20-desktop, IceWM 1.3.7
  • i7 64bit - test machine - Intel i7, 6GB Ram, HDD, openSuse 12.2 64bit, Qt 4.8.4, EFL 1.7.99, kernel 3.4.28-2.20-desktop, IceWM 1.3.7.
I would like to point out main differences which can have impact on test results:
  • i7 64bit has newest EFL version. According to this article, there is high possibility that this version has bigger memory footprint than the older ones.
  • i7 32bit has SSD drive. Results of cold run tests are significantly different on other test machines (without SSD drive).

Tools

I used the ksysguard app for memory analysis. To be sure that it is a trustworthy tool, I also used smem for first samples. Results were still the same. All the binaries were examined just after starting and showing main window, without performing any interactive steps. A note: I have already worked on benchmarks in my professional career. A light IceWM was used with no background tasks running that would interfere with the test. So most risks of bluring the results by a desktop environments such as E17 or KDE Plasma Workspaces has been reduced. Additional  explanations:
  • Measure unit - KiB
  • Private - memory used only by the examined process
  • Shared - memory that can be shared between processes (e.g. shared libraries' own memory)
  • Rss (Resident set size) - Private + Shared - shown in /proc/<pid>/status under VmRSS
  • Pss (Proportional set size) - Private + Shared/(number of processes) - lowers if more processes use the same shared libraries
  • Swap - memory swapped out to the disk

PSS is the most important value as it reflects real memory usage in applications. Shared memory value is divided by number of processes that use it. Following charts show how PSS of every compared application changes on different test machines.
Memory consumption comparison - 1 application instance
Measure unit[KB] QML App QML Ap with compiled-in resources EFL App
i5 32bit
HDD
Private
12248
12284
13180
Shared
15300
15300
2620
Rss
27548
27584
15800
Pss
18498
18534
13606
Swap
0
0
0
i7 32bit
SSD
Private
12404
12388
13756
Shared
15504
15504
3736
Rss
27908
27892
17492
Pss
19119
18786
14453
Swap
0
0
0
i5 64bit
HDD
Private
14880
14860
17848
Shared
10972
11004
5096
Rss
25852
25864
22944
Pss
18874
18871
18765
Swap
0
0
0
i7 64bit
HDD
Private
14568
14564
19304
Shared
10988
11020
6592
Rss
25556
25584
25896
Pss
18761
18787
20698
Swap
0
0
0
Memory consumption comparison - 10 application instances
Measure unit[KB] QML App QML App with compiled-in resources EFL App
i5 32bit
HDD
Private
7908
7920
9808
Shared
19708
19856
6348
Rss
27616
27776
16156
Pss
11392
11391
10725
Swap
0
0
0
i7 32bit
SSD
Private
8076
8000
9908
Shared
20028
19900
7600
Rss
28104
27900
17508
Pss
11584
11461
10903
Swap
0
0
0
i5 64bit
HDD
Private
9864
9932
12716
Shared
16272
16276
10012
Rss
26136
26208
22728
Pss
12793
12869
14125
Swap
0
0
0
i7 64bit
HDD
Private
9356
9408
14172
Shared
16116
16148
11728
Rss
25472
25556
25900
Pss
12163
12071
15798
Swap
0
0
0
Memory consumption comparison - 10 application instances
Measure unit[KB] QML App QML App with compiled-in resources EFL App
i5 32bit
HDD
Private
7916
7924
9484
Shared
19760
18856
6356
Rss
27676
27780
15840
Pss
10037
10060
9953
Swap
0
0
0
i7 32bit
SSD
Private
8004
7988
10204
Shared
19892
19980
7600
Rss
27896
27968
17804
Pss
10128
10124
10744
Swap
0
0
0
i5 64bit
HDD
Private
9792
9764
12764
Shared
16272
16084
10020
Rss
26064
25848
22784
Pss
11630
11575
13516
Swap
0
0
0
i7 64bit
HDD
Private
9460
9440
141760
Shared
16384
16380
11736
Rss
25844
25820
25912
Pss
11285
11195
15030
Swap
0
0
0
As we can see there are a few interesting results:
  • differences between QmlMiner and QmlMiner with resources compiled in are very small (0-2%) and can be ignored
  • as the number of processes increase, QmlMiner's PSS lowers faster than elemines's. In case of 10 instances QmlMiner's PSS is already:
    • 1% higher on i5 32bit
    • 6% lower on i7 32bit 
    • 16% lower on i5 64bit 
    • 33% lower on i7 64bit
  • elemines consumes more memory than QmlMiner on 64bit platforms (with exception for the one instance test and i5 64bit platform)
  • in the one-application instance test on 32bit platforms QmlMiners's PSS result is significantly higher (32-36%) but it decreases with number of processes.
The main conclusion is that elemines occupies much more memory in 64bit architecture than QmlMiner (up to 33% more). Additionally it has low ratio of shared-to-private memory and therefore its PSS factor will not decrease much when number of processes (that share common code) increase. On the i5 32bit platform one instance of QmlMiner has 36% higher PSS than elemines, however on i7 64bit platform one of 10 elemines instances has 33% PSS higher PSS than corresponding QmlMiner's instance. We could say that on homogeneous platforms (either Qt for QmlMiner or EFL for elemines), with increasing number of processes based on certain framework, QmlMiner and other applications based on the Qt Quick technology consumes much less memory than elemines thanks to more aggressive code and resource sharing.


4. Startup time


Tools

time(1) command was used for measuring of startup time. As before, a light IceWM was used with no background user tasks or costly services running to avoid interference with the test. So influence of a desktop such as E17 or KDE Plasma Workspaces has been reduced. A "Warm start" test was performed to measure the "warm" start of application. Following command was used
for i in {1..100}; \
   do /usr/bin/time -f"%S;%U;%e" \
   -a -o $csvFile ./ten_runs_with_caches.sh; \
done   
ten_runs_with_caches.sh script sequentially invokes corresponding binary 10 times. This way it increases test's precision to three digits because time(1)'s returns results with 2 digit precision only, what's important because times measured tend to be very small on the modern machines. ,,Cold start" tests were also performed to measure "cold" start of application, ten_runs_without_caches.sh script was used to invoke the corresponding binary with dropping caches before every execution by running:
/sbin/sysctl -q vm.drop_caches=3 && ./binary
So for every application, on every platform (except the i7 64bit),  1000 execution sample have been collected and median counted (I had limited access to i7 64bit test machine and only 250 (25x10) executions sample was gathered).
To measure the startup time efficiently, I have modified source code of both applications. QmlMiner was forced to exit just before QApplication::exec() function:
   ...  
   viewer.show();  
   exit(0);  
   return app.exec();  
 }  
and elemines exited just before returning from the gui function:
   ...  
   evas_object_show(window);  
   exit(0);  
   return EINA_TRUE;  
 }  
Additional explanations:
  • Measure unit - seconds
  • System - total number of CPU-seconds used by the system on behalf of the process (in kernel mode)
  • User - total number of CPU-seconds that the process used directly (in user mode)

Following chart shows the summary values for System and User time spent by each application in different environments.
QML App, QML App with compiled-in resources and EFL App startup time on different targets
Measure unit[ms] Warm start Cold start
System
User
System
+
User
System
User
System
+
User
i5 32bit
HDD
QML App
28
124
152
80
162
242
QML App with compiled-in resources
27
123
150
80
163
243
EFL App
20
92
112
74
123.5
197.5
i7 32bit
SSD
QML App
17
97
114
47.5
98
145.5
QML App with compiled-in resources
18
97
115
47
98
145
EFL App
14
67
81
38
60
98
i5 64bit
HDD
QML App
14
91
105
55
116
171
QML App with compiled-in resources
15
91
106
55
115
170
EFL App
14
56
70
55
86
141
i7 64bit
HDD
QML App
14
90
104
46
106
152
QML App with compiled-in resources
15
91
105
47
106
153
EFL App
24
83
107
58
87
145

There is no significant difference between startup times of both version of QmlMiner. I suppose that if the test was performed on embedded devices more dissimilarity could be seen. Relative difference between QmlMiner and elemines is very varied between the test cases. On i5 64bit warm run test QmlMiner starts 50% slower than elemines but considering absolute values it's only 35 ms (in other cases it varies from 30 to 47.5 ms). I have expected much higher differences because Qt has to initialize the QML engine, the JavaScript engine, and has to parse and compile qml source files to the QML bytecode. On the i7 64bit warm run test QmlMiner starts 2% faster than elemines. It could be caused by more recent EFL version used to compile elemines (1.7.99 than on other platforms 1.7.4 and 1.7.5) Another notice is that 64bit builds of the applications start faster than their 32bit builds.

Summary: 


I am amazed how quickly QmlMiner could be implemented. Originally it even offered some hidden features but I removed them on purpose because QmlMiner should be as similar to the EFL-based elemines as possible while doing comparison. For example the QtQuick implementation has a dimension parameter which could change number of dashboard elements (I have played on 50x50 board). One can also change number of bombs using a "bombCount" parameter. Taking more scientific approach while comparing developer experience average-bug-count-per-1000-lines metric could be used as a point for QtQuick. Specifics of C language are used in various models when estimating workload of C-based projects, e.g. COCOMO in the SLOCCount tool. There is no estimation for the QML language as of now but most software engineers accustomed with the topic would say that QML is clearly a higher-level language than C, so writing application in QML is much more organized and less error prone than doing so in plain C.  While I suppose that EFL edc file includes some declarative code for application's behavior unfortunately I could not spot anything in the code. I did not go through EFL docs and I am not sure if I will do this in future. You are welcome to do so. The startup times are relatively short for both applications (the highest difference is 47.5 ms). Proportional memory consumption is comparable and QtQuick has advantage on homogenous platforms - thanks to full portability (binary independence). It also performs well on 64bit architectures. In such basic applications there is no possibility of measuring performance perhaps other than FPS on resizing. I have noticed issues with elemines resizing (slow refreshing of the window's content). I have asked a question (on the enlightenment forum) about this issue (which could be caused by broken compilation of EFL) but I am still waiting for precise answer.
All this looks like a big eye-opener for QtQuick skeptics especially  because I was comparing:
  1. QtQuick app that uses QML code for system-independent virtual-machine-based runtime, parsed/compiled to a bytecode at runtime (details for QML2 at http://www.kdab.com/qml-engine-internals-part-1-qml-file-loading/) and business logic is written in JavaScript. Using Qt/C++ here is possible for performance reasons but that wasn't necessary for this test.
  2. EFL app written in plain C code, optimized at compile time by GCC, possibly for given CPU and operating system, with business logic written in C as well. (Note: there is elev8, early effort of JavaScript bindings for EFL but it's not mentioned in the official documentation. Unless such projects reach stable milestone, I see EFL's approach to GUI programming as more compiled-in or “static” than QtQuick's approach). EFL's elementary graphical framework is not extensible at runtime so new components cannot be added without going back to C compiler.
Taking these points into account it is surprising to see QtQuick app performing similarly to C-based app made in EFL. In addition, QtQuick introduces useful features not present in the EFL app (binary-independence, network transparency, safer memory operations) without sacrificing performance (compared to EFL). Furthermore these tests can be repeated for Qt 5/QML2 which is reportedly even more optimized.
If you are EFL or QML expert or enthusiast, feel free to send me your notes or correction for any aspect covered by this article or methodology used for collecting the data. The QmlMiner app is available in my KDE scratch git repository at: http://quickgit.kde.org/?p=scratch/tolszak/qmlminer.git It can be compiled with qmake or just run with qmlviewer (with QmlMiner.qml as an argument). The main.cpp file was added only as optimization to avoid running the full qmlviewer tool while performing comparisons since EFL has no adequate runtime tool (edc files are compiled to a binary). All data used for this article (startup times, memory consumption some summaries in *ods files) can be fetched from QmlMinerArticleData git repository: http://quickgit.kde.org/?p=scratch/tolszak/QmlMinerArticleData.git)