On Wed, Mar 30, 2011 at 12:24:02PM -0700, Steven Dake wrote:
On 03/27/2011 08:31 PM, Angus Salkeld wrote:
Hi
I am busy working the logging feature for version 0.5.0 (http://libqb.org/wiki/index.php/Todo_List)
I am working currently on github (https://github.com/asalkeld/libqb/tree/logging) and will merge as I get more tested and some feedback.
Regards Angus Salkeld
quarterback-devel mailing list quarterback-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/quarterback-devel
I had a look at logging performance mainly concerned with the flight recorder throughput. I'll send my benchmark program as a separate patch.
The performance looks close to logsys (with the ring buffer case) once the timestamp recording is turned off. I expect you would get more performance if the ring buffer was written directly into rather then the current memcpy operation that occurs.
Thanks for the great feedback! I'll improve it today.
-Angus
[root@mrg-01 test]# ./logsysbench heating up cache with logrec functionality log_printf 1 arguments: 8114860.988 operations/sec log_printf 2 arguments: 4337254.501 operations/sec log_printf 3 arguments: 3734625.016 operations/sec
[root@mrg-01 tests]# ./bench-log heating up cache with qb_log functionality qb_log 1 arguments: 7892055.620 operations/sec qb_log 2 arguments: 4557025.247 operations/sec qb_log 3 arguments: 3858326.867 operations/sec
note I get a segfault with: static void qb_log_external_source_free(struct qb_log_callsite *cs) { if (cs < __start___verbose || cs > __stop___verbose) {
free(cs);
+// free(cs);
That's odd, I wonder what the address was?
} }
I had to turn this off:
- _log_timestamp(char_time);
+// _log_timestamp(char_time);
Yes, we get the time in multiple places - not good. I'll investigate your idea of monotonic clock.
or performance is abysmal. [root@mrg-01 tests]# ./bench-log heating up cache with qb_log functionality qb_log 1 arguments: 1535615.688 operations/sec qb_log 2 arguments: 1352879.191 operations/sec qb_log 3 arguments: 1279942.331 operations/sec
I am pretty sure retrieving the system time does a bus lock on the processor. A better choice might be a monotonic clock (which is software based) - and then determining the actual time when output to a target. This way the expense of formatting the time is only done when outputting to a log target that is not memory based (or if memory based, the monotonic time is recorded and system time can be determined later by the replay operation.
(doing a clock_get (CLOCK_MONOTONIC) call in the logging function) [root@mrg-01 tests]# ./bench-log heating up cache with qb_log functionality qb_log 1 arguments: 5812625.371 operations/sec qb_log 2 arguments: 3630184.402 operations/sec qb_log 3 arguments: 3204488.335 operations/sec
The real time can be determined by taking the system start time in nanoseconds since an eopch and adding the monotonic time (converted to nanoseconds) to it.
In general timestamps while useful are also extremely expensive which is why it is helpful to keep them out of the main logging function or limit their impact.
With the timestamps compiled out of the code, i get following oprofile results
212295 84.1225 /root/libqb/tests/.libs/lt-bench-log CPU_CLK_UNHALT...| samples| %|
122939 57.9095 /lib64/libc-2.12.so 82025 38.6373 /root/libqb/lib/.libs/libqb.so.0.4.1 5772 2.7189 /lib64/libpthread-2.12.so
From lib64/libc-2.12:
48922 39.6869 libc-2.12.so lt-bench-log vfprintf 20878 16.9368 libc-2.12.so lt-bench-log _IO_default_ xsputn 12583 10.2077 libc-2.12.so lt-bench-log strchrnul 8209 6.6594 libc-2.12.so lt-bench-log __strlen_sse 42 7577 6.1467 libc-2.12.so lt-bench-log vsnprintf 7225 5.8611 libc-2.12.so lt-bench-log memcpy 7181 5.8254 libc-2.12.so lt-bench-log _IO_str_init _static_internal 5579 4.5258 libc-2.12.so lt-bench-log _IO_old_init 3175 2.5756 libc-2.12.so lt-bench-log _IO_no_init 1595 1.2939 libc-2.12.so lt-bench-log _IO_setb
from libqb.so: samples % image name symbol name 38018 49.8806 libqb.so.0.4.1 qb_log_real_ 10224 13.4142 libqb.so.0.4.1 _blackbox_logger 7885 10.3453 libqb.so.0.4.1 qb_rb_chunk_check 5211 6.8370 libqb.so.0.4.1 qb_rb_chunk_step 4499 5.9028 libqb.so.0.4.1 qb_rb_chunk_alloc 3148 4.1303 libqb.so.0.4.1 qb_rb_chunk_reclaim 2705 3.5490 libqb.so.0.4.1 qb_rb_space_free 2444 3.2066 libqb.so.0.4.1 qb_rb_chunk_commit 1054 1.3829 libqb.so.0.4.1 qb_rb_space_used 1030 1.3514 libqb.so.0.4.1 my_posix_sem_post
I notice qb_log_real_ takes alot of time. I got better results by not running through the entire 32 entries in the for loop in that function but instead only running through the one i was interested in: heating up cache with qb_log functionality qb_log 1 arguments: 9463182.071 operations/sec qb_log 2 arguments: 4920976.499 operations/sec qb_log 3 arguments: 4153522.499 operations/sec
One way to improve this would be to put a simple list in each array element and then link the list for the "active log entries". Considering it improves performance by 20% for small configured system cases, it may be worth it.
Yes, I doubt 32 will ever be used. I'll do that.
I had always wanted to remove the strlen operations from the blackbox logger (because it should be possible to pre-calculate the string lengths of filename and function name).
We could put the strlen for these into the callsite struct.
The rest looks as optimized is is likely to happen in the near term.
Regards -steve _______________________________________________ quarterback-devel mailing list quarterback-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/quarterback-devel