Qt Developer Adventures: Simple QML vs EFL comparison

Monday, March 18, 2013

Simple QML vs EFL comparison

Recently I found this blog post about complete minesweeper clone - elemines - based on Enlightenment Foundation Libraries. As EFL are designed to efficiently work even on PDAs, I came up with an idea to implement similar clone in plain QML/Javascript (QmlMiner) and perform simple comparative analysis. I wondered how the QML version would compare with the EFL one.
Following areas were analyzed:

The comparison was concluded with a limited performance check.
You can look at it from many angles. Just note that I was comparing virtual-machine-based runtime (Qt4/QML - QtQuick 1.1, JavaScript) with EFL app that is coded in C and compiled into a native binary to see how much advantage the low-level C programming has over more modern technology such as QML.

1. Developer experience

While creating the QmlMiner, I copied as much appearance and functionality as I could from elemines but decided not to look at elemines code.

This is how both apps look like:

QmlMiner

elemines

Surprisingly, the QML implementation didn't take much time:

Activity	Hours	Comment
Development time spent on JavaScript code	8	I have never written any minesweeper engine
Development time spent on QML code	12	Dialogs, button, animations(explosions) etc.

I should also mention my experience with related technologies:

Intermediate Qt knowledge (3 years)
A few months of QML development
Basic Javascript knowledge (only used with QML code)

Plain Qt knowledge was not a requirement in this case because the QmlMiner contains no C/C++ code at all. Understanding QML and simple JS was just enough. The QmlMiner can be executed with the qmlviewer tool but afterwards I added simple main.cpp for memory and startup performance tests to make "lightness" of the QML app possibly similar to elemines. Basically, qmlviewer has many features that are not needed for the task.
After creating the QmlMiner I reviewed some source code of elemines to spot similarities and noticed that both applications have corresponding game engines and default.edc (*.edc) file has syntax similar to JSON. It has programs section which I believe is somewhat similar to QML states.
Another observation is that the edc file is a resource of data used in C code. The C code creates objects, imperatively defines interactions between them and lays out the UI. Please note these are only my guesses - guesses of developer neither much experienced in plain C nor in EFL.

2. Source code size and used languages

I measured size of source code with the wc command and SLOCCount tool. SLOCCount was used to count lines of elemines C code and QmlMiner's C++ code (it skipped comments). wc (with -l option) was used to count lines of QML files and eliemines's deafult.edc file. The QML file didn't have any comments and the edc file had only 20, which I excluded. I assumed that the whole QmlMinerModel.qml file contains JS code - it is the ,,game engine''. Other files describe the GUI (look and behavior). I didn't analysed the build system files (Makefiles in elemines and a pro file in QmlMiner) because they are not very relevant.

The results are as follows:

purpose	QmlMiner		EFL elemines
Business logic	jacascript	137	c	572
UI and behavior	qml	518	edc	915
boostrap	c++	13
Total	668		1487

3. Memory consumption

Definitions

Before presenting memory consumption and startup statistics, I would like to explain some terms:

QML App (with compiled-in resources) - an application compiled with all needed resource files compiled into the binary as Qt resources
QML App - a standard QmlMiner application. All external resources are kept outside of the application binary
EFL App - the original elemines 0.1 application
i5 32bit - test machine - Intel i5, 4GB Ram, HDD, Ubuntu 12.10 32bit, Qt 4.8.4, EFL 1.7.4, kernel 3.5.0-25-generic, IceWM 1.3.7
i7 32bit - test machine - Intel i7, 8GB Ram, SSD, Ubuntu 12.10 32bit, Qt 4.8.4, EFL 1.7.4, kernel 3.5.0-26-generic, IceWM 1.3.7,
i5 64bit - test machine - Intel i5, 4GB Ram, HDD, openSuse 12.2 64bit, Qt 4.8.4, EFL 1.7.5, kernel 3.4.28-2.20-desktop, IceWM 1.3.7
i7 64bit - test machine - Intel i7, 6GB Ram, HDD, openSuse 12.2 64bit, Qt 4.8.4, EFL 1.7.99, kernel 3.4.28-2.20-desktop, IceWM 1.3.7.

I would like to point out main differences which can have impact on test results:

i7 64bit has newest EFL version. According to this article, there is high possibility that this version has bigger memory footprint than the older ones.
i7 32bit has SSD drive. Results of cold run tests are significantly different on other test machines (without SSD drive).

Tools

I used the ksysguard app for memory analysis. To be sure that it is a trustworthy tool, I also used smem for first samples. Results were still the same. All the binaries were examined just after starting and showing main window, without performing any interactive steps. A note: I have already worked on benchmarks in my professional career. A light IceWM was used with no background tasks running that would interfere with the test. So most risks of bluring the results by a desktop environments such as E17 or KDE Plasma Workspaces has been reduced. Additional explanations:

Measure unit - KiB
Private - memory used only by the examined process
Shared - memory that can be shared between processes (e.g. shared libraries' own memory)
Rss (Resident set size) - Private + Shared - shown in /proc/<pid>/status under VmRSS
Pss (Proportional set size) - Private + Shared/(number of processes) - lowers if more processes use the same shared libraries
Swap - memory swapped out to the disk

PSS is the most important value as it reflects real memory usage in applications. Shared memory value is divided by number of processes that use it. Following charts show how PSS of every compared application changes on different test machines.

Memory consumption comparison - 1 application instance

Measure unit[KB]		QML App	QML Ap with compiled-in resources	EFL App
i5 32bit HDD
	Private	12248	12284	13180
	Shared	15300	15300	2620
	Rss	27548	27584	15800
	Pss	18498	18534	13606
	Swap	0	0	0
i7 32bit SSD
	Private	12404	12388	13756
	Shared	15504	15504	3736
	Rss	27908	27892	17492
	Pss	19119	18786	14453
	Swap	0	0	0
i5 64bit HDD
	Private	14880	14860	17848
	Shared	10972	11004	5096
	Rss	25852	25864	22944
	Pss	18874	18871	18765
	Swap	0	0	0
i7 64bit HDD
	Private	14568	14564	19304
	Shared	10988	11020	6592
	Rss	25556	25584	25896
	Pss	18761	18787	20698
	Swap	0	0	0

Memory consumption comparison - 10 application instances

Measure unit[KB]		QML App	QML App with compiled-in resources	EFL App
i5 32bit HDD
	Private	7908	7920	9808
	Shared	19708	19856	6348
	Rss	27616	27776	16156
	Pss	11392	11391	10725
	Swap	0	0	0
i7 32bit SSD
	Private	8076	8000	9908
	Shared	20028	19900	7600
	Rss	28104	27900	17508
	Pss	11584	11461	10903
	Swap	0	0	0
i5 64bit HDD
	Private	9864	9932	12716
	Shared	16272	16276	10012
	Rss	26136	26208	22728
	Pss	12793	12869	14125
	Swap	0	0	0
i7 64bit HDD
	Private	9356	9408	14172
	Shared	16116	16148	11728
	Rss	25472	25556	25900
	Pss	12163	12071	15798
	Swap	0	0	0

Memory consumption comparison - 10 application instances

Measure unit[KB]		QML App	QML App with compiled-in resources	EFL App
i5 32bit HDD
	Private	7916	7924	9484
	Shared	19760	18856	6356
	Rss	27676	27780	15840
	Pss	10037	10060	9953
	Swap	0	0	0
i7 32bit SSD
	Private	8004	7988	10204
	Shared	19892	19980	7600
	Rss	27896	27968	17804
	Pss	10128	10124	10744
	Swap	0	0	0
i5 64bit HDD
	Private	9792	9764	12764
	Shared	16272	16084	10020
	Rss	26064	25848	22784
	Pss	11630	11575	13516
	Swap	0	0	0
i7 64bit HDD
	Private	9460	9440	141760
	Shared	16384	16380	11736
	Rss	25844	25820	25912
	Pss	11285	11195	15030
	Swap	0	0	0

As we can see there are a few interesting results:

differences between QmlMiner and QmlMiner with resources compiled in are very small (0-2%) and can be ignored
as the number of processes increase, QmlMiner's PSS lowers faster than elemines's. In case of 10 instances QmlMiner's PSS is already:

1% higher on i5 32bit
6% lower on i7 32bit
16% lower on i5 64bit
33% lower on i7 64bit

elemines consumes more memory than QmlMiner on 64bit platforms (with exception for the one instance test and i5 64bit platform)
in the one-application instance test on 32bit platforms QmlMiners's PSS result is significantly higher (32-36%) but it decreases with number of processes.

The main conclusion is that elemines occupies much more memory in 64bit architecture than QmlMiner (up to 33% more). Additionally it has low ratio of shared-to-private memory and therefore its PSS factor will not decrease much when number of processes (that share common code) increase. On the i5 32bit platform one instance of QmlMiner has 36% higher PSS than elemines, however on i7 64bit platform one of 10 elemines instances has 33% PSS higher PSS than corresponding QmlMiner's instance. We could say that on homogeneous platforms (either Qt for QmlMiner or EFL for elemines), with increasing number of processes based on certain framework, QmlMiner and other applications based on the Qt Quick technology consumes much less memory than elemines thanks to more aggressive code and resource sharing.

4. Startup time

Tools

time(1) command was used for measuring of startup time. As before, a light IceWM was used with no background user tasks or costly services running to avoid interference with the test. So influence of a desktop such as E17 or KDE Plasma Workspaces has been reduced. A "Warm start" test was performed to measure the "warm" start of application. Following command was used

for i in {1..100}; \
   do /usr/bin/time -f"%S;%U;%e" \
   -a -o $csvFile ./ten_runs_with_caches.sh; \
done

ten_runs_with_caches.sh script sequentially invokes corresponding binary 10 times. This way it increases test's precision to three digits because time(1)'s returns results with 2 digit precision only, what's important because times measured tend to be very small on the modern machines. ,,Cold start" tests were also performed to measure "cold" start of application, ten_runs_without_caches.sh script was used to invoke the corresponding binary with dropping caches before every execution by running:

/sbin/sysctl -q vm.drop_caches=3 && ./binary

So for every application, on every platform (except the i7 64bit), 1000 execution sample have been collected and median counted (I had limited access to i7 64bit test machine and only 250 (25x10) executions sample was gathered).

To measure the startup time efficiently, I have modified source code of both applications. QmlMiner was forced to exit just before QApplication::exec() function:

   ...  
   viewer.show();  
   exit(0);  
   return app.exec();  
 }

and elemines exited just before returning from the gui function:

   ...  
   evas_object_show(window);  
   exit(0);  
   return EINA_TRUE;  
 }

Additional explanations:

Measure unit - seconds
System - total number of CPU-seconds used by the system on behalf of the process (in kernel mode)
User - total number of CPU-seconds that the process used directly (in user mode)

Following chart shows the summary values for System and User time spent by each application in different environments.

QML App, QML App with compiled-in resources and EFL App startup time on different targets

Measure unit[ms]		Warm start			Cold start
Measure unit[ms]		System	User	System + User	System	User	System + User
i5 32bit HDD
	QML App	28	124	152	80	162	242
	QML App with compiled-in resources	27	123	150	80	163	243
	EFL App	20	92	112	74	123.5	197.5
i7 32bit SSD
	QML App	17	97	114	47.5	98	145.5
	QML App with compiled-in resources	18	97	115	47	98	145
	EFL App	14	67	81	38	60	98
i5 64bit HDD
	QML App	14	91	105	55	116	171
	QML App with compiled-in resources	15	91	106	55	115	170
	EFL App	14	56	70	55	86	141
i7 64bit HDD
	QML App	14	90	104	46	106	152
	QML App with compiled-in resources	15	91	105	47	106	153
	EFL App	24	83	107	58	87	145

There is no significant difference between startup times of both version of QmlMiner. I suppose that if the test was performed on embedded devices more dissimilarity could be seen. Relative difference between QmlMiner and elemines is very varied between the test cases. On i5 64bit warm run test QmlMiner starts 50% slower than elemines but considering absolute values it's only 35 ms (in other cases it varies from 30 to 47.5 ms). I have expected much higher differences because Qt has to initialize the QML engine, the JavaScript engine, and has to parse and compile qml source files to the QML bytecode. On the i7 64bit warm run test QmlMiner starts 2% faster than elemines. It could be caused by more recent EFL version used to compile elemines (1.7.99 than on other platforms 1.7.4 and 1.7.5) Another notice is that 64bit builds of the applications start faster than their 32bit builds.

Summary:

I am amazed how quickly QmlMiner could be implemented. Originally it even offered some hidden features but I removed them on purpose because QmlMiner should be as similar to the EFL-based elemines as possible while doing comparison. For example the QtQuick implementation has a dimension parameter which could change number of dashboard elements (I have played on 50x50 board). One can also change number of bombs using a "bombCount" parameter. Taking more scientific approach while comparing developer experience average-bug-count-per-1000-lines metric could be used as a point for QtQuick. Specifics of C language are used in various models when estimating workload of C-based projects, e.g. COCOMO in the SLOCCount tool. There is no estimation for the QML language as of now but most software engineers accustomed with the topic would say that QML is clearly a higher-level language than C, so writing application in QML is much more organized and less error prone than doing so in plain C. While I suppose that EFL edc file includes some declarative code for application's behavior unfortunately I could not spot anything in the code. I did not go through EFL docs and I am not sure if I will do this in future. You are welcome to do so. The startup times are relatively short for both applications (the highest difference is 47.5 ms). Proportional memory consumption is comparable and QtQuick has advantage on homogenous platforms - thanks to full portability (binary independence). It also performs well on 64bit architectures. In such basic applications there is no possibility of measuring performance perhaps other than FPS on resizing. I have noticed issues with elemines resizing (slow refreshing of the window's content). I have asked a question (on the enlightenment forum) about this issue (which could be caused by broken compilation of EFL) but I am still waiting for precise answer.

All this looks like a big eye-opener for QtQuick skeptics especially because I was comparing:

QtQuick app that uses QML code for system-independent virtual-machine-based runtime, parsed/compiled to a bytecode at runtime (details for QML2 at http://www.kdab.com/qml-engine-internals-part-1-qml-file-loading/) and business logic is written in JavaScript. Using Qt/C++ here is possible for performance reasons but that wasn't necessary for this test.
EFL app written in plain C code, optimized at compile time by GCC, possibly for given CPU and operating system, with business logic written in C as well. (Note: there is elev8, early effort of JavaScript bindings for EFL but it's not mentioned in the official documentation. Unless such projects reach stable milestone, I see EFL's approach to GUI programming as more compiled-in or “static” than QtQuick's approach). EFL's elementary graphical framework is not extensible at runtime so new components cannot be added without going back to C compiler.

Taking these points into account it is surprising to see QtQuick app performing similarly to C-based app made in EFL. In addition, QtQuick introduces useful features not present in the EFL app (binary-independence, network transparency, safer memory operations) without sacrificing performance (compared to EFL). Furthermore these tests can be repeated for Qt 5/QML2 which is reportedly even more optimized.

If you are EFL or QML expert or enthusiast, feel free to send me your notes or correction for any aspect covered by this article or methodology used for collecting the data. The QmlMiner app is available in my KDE scratch git repository at: http://quickgit.kde.org/?p=scratch/tolszak/qmlminer.git It can be compiled with qmake or just run with qmlviewer (with QmlMiner.qml as an argument). The main.cpp file was added only as optimization to avoid running the full qmlviewer tool while performing comparisons since EFL has no adequate runtime tool (edc files are compiled to a binary). All data used for this article (startup times, memory consumption some summaries in *ods files) can be fetched from QmlMinerArticleData git repository: http://quickgit.kde.org/?p=scratch/tolszak/QmlMinerArticleData.git)

21 comments:

Cedric BAILMarch 18, 2013 at 9:45 PM
Excellent article and very interesting ! I am a developer of EFL, so just a few remark regarding EFL and it's use.

First I totally agree with your view on EDC. It is not designed at all to do an application with it, it is designed to hold your theme and that's it. Every apps can be fully rethemed without a change in your application. That's why edc files have more limits than QtQuick. It is also why we are working on Elev8, hopefully later this year we will be able to release a framework around it and edc, but at this point it is not ready for prime time.

Now back to the startup time and multi process benchmark. We do have two additional daemon that exist to mitigate what you are seeing there. The first one is evas_cserve2, that enable sharing of image and font across all application. Very useful when you use the same theme and font as the second application will not need to load them at all. This reduce application startup time and memory usage (I think it will not impact your benchmark on startup time as you exit before the rendering of the first frame, btw if you could change your benchmark to stop after the rendering of the first frame, that would be interesting). Sadly it seems to have been broken during last month and it needs to be fixed before 1.8 release.

The second daemon is elm_quicklaunch, that one does load all efl libraries and pre initialize efl as much as it can. Then when an app start, it ask that daemon to fork, open and then start that application. This drastically reduce the load time of an application.

Both of them should give you an order of magnitude faster startup time and reduce also memory usage significantly. Not that for elev8 we do have the same concept, where a daemon sit in the background being ready to startup a JS as soon as possible. So we do not expect any slow down in the startup time of Elev8 application. I guess that it should be doable to do the same in QtQuick, so there should not be any time difference at the end.
ReplyDelete
Replies
UnknownMarch 19, 2013 at 5:20 AM
Well, elemines is surely not the best EFL application around to do some benchmarks. As you know, I already did fix some memory leaks in 0.2.x. I'm not an expert and did this mostly to learn C/EFL. Some comments:

- edc code size: could be a lot shorter without the default values that I chose to keep for clarity. Moreover, it could be factored now. At the beginning of the project, I didn't need to.

- I provide extra fonts that may need more memory. Your screenshot of the qml application doesn't show the same things.

- Memory footprint high values for 64bit in elemines is probably due to pointers size, but I don't know how QML works.

- As ced said, I used elementary+edje which was easier for me to begin but doesn't seem to be the most efficient way.

I think they are better EFL application for benchmarking. For instance, expedite. Anyway, thanks for your work, it will surely be useful for me too :-)
ReplyDelete
Replies
Ian MonroeMarch 19, 2013 at 10:31 PM
A raspberry pi would be better platform to do benchmarks. Basically they are both the same on a desktop. :)

Also I'd port to Qt 5. QML in Qt 4 is not the most efficient beast (QML->QGV->QPainter->raster...)
ReplyDelete
Replies
renoXMarch 20, 2013 at 3:08 AM
Interesting test on wrong hardware == not very interesting result..
How about testing the differences on a 10year old low end computer?
Simple application should be tested on low end computers, not speed daemon..
ReplyDelete
Replies
StefanMarch 20, 2013 at 12:43 PM
WTF is EFL?
European Football League?

From Wikipedia, the free encyclopedia:
"EFL most commonly refers to English as a Foreign Language, ..."?
ReplyDelete
Replies
CristianMarch 20, 2013 at 1:12 PM
EFL stands for Enlightenment Foundation Libraries [1], which are used by the Enlightenment (window manager) [2] and Tizen (operating system) [3]

[1] http://en.wikipedia.org/wiki/Enlightenment_Foundation_Libraries
[2] http://en.wikipedia.org/wiki/Enlightenment_%28window_manager%29
[3] http://en.wikipedia.org/wiki/Tizen
ReplyDelete
Replies
AnonymousMarch 20, 2013 at 3:35 PM
Umm, where is the development time for EFL?

Also, +1 on porting to Qt5, QtQuick2 has better performance in some aspects, and a little more bloat in others, so I am curious of the results vs QtQuick1.
ReplyDelete
Replies
AnonymousApril 3, 2013 at 11:14 PM
There are game developers that allow the community to twinkle with their games by giving them tools to make maps, mods or whole games . One popular game developer that does this as a strategy is Valve. If you buy a game from them, they will allow you to download the “Source SDK.” This is a software developing kit based on their game engine (called “the source engine”) which will allow you to map for a game that they already created – or make a game of your own. This job is what the names says. You can get paid to play games in their early state as long as you give the developers some feedback about their games. This isn’t like you play a demo game and they pay you for it. It requires you to have a sharp eye and ears to spot some bugs and glitches. It also requires an ability to criticize the game. The feedback you give will help them improve the game’s gameplay. Game Developer
ReplyDelete
Replies

Add comment