Milan Zamazal has uploaded a new change for review.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
logging: Don't crash on non-ASCII in SimpleLogAdapter
The values and messages passed to SimpleLogAdapter can be mixtures of strings and unicodes containing various non-ASCII characters. If they are not properly handled, concatenation of incompatible strings raises UnicodeDecodeError.
We avoid that problem in SimpleLogAdapter by converting all the processed texts to unicodes before joining them. We assume non-ASCII strings are in UTF-8 but if they are not then we shouldn't crash at least.
Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Bug-Url: https://bugzilla.redhat.com/1276906 Bug-Url: https://bugzilla.redhat.com/1260131 Signed-off-by: Milan Zamazal mzamazal@redhat.com --- M tests/utilsTests.py M vdsm/logUtils.py 2 files changed, 28 insertions(+), 3 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/42/48542/1
diff --git a/tests/utilsTests.py b/tests/utilsTests.py index 68547ce..f4f1a58 100644 --- a/tests/utilsTests.py +++ b/tests/utilsTests.py @@ -1,3 +1,4 @@ +# -*- coding: utf-8 -*- # # Copyright 2012-2013 Red Hat, Inc. # @@ -41,6 +42,7 @@ from vdsm import taskset from vdsm import utils from vdsm import cmdutils +import logUtils
from monkeypatch import MonkeyPatch from vmTestsData import VM_STATUS_DUMP @@ -1070,3 +1072,22 @@ proc.start() self._noIntrWatchFd(myPipe, isEpoll=False, mask=select.POLLIN) proc.join() + + +class LoggingTests(TestCaseBase): + + def setUp(self): + self._log = FakeLogger(logging.DEBUG) + + def test_nonascii(self): + # Just test it doesn't crash with mixture of string and unicodes in + # various codings. + invalid = chr(254) + chr(0) + u_invalid = unichr(253) + unichr(254) + extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', + e5=invalid, e6=u_invalid) + log = logUtils.SimpleLogAdapter(self._log, extra) + log.debug("Dobrý večer") + log.debug(u"Dobrý večer") + log.debug(invalid) + log.debug(u_invalid) diff --git a/vdsm/logUtils.py b/vdsm/logUtils.py index 5caa167..49bb2cf 100644 --- a/vdsm/logUtils.py +++ b/vdsm/logUtils.py @@ -90,10 +90,14 @@ warn = logging.LoggerAdapter.warning
def process(self, msg, kwargs): - result = '' + result = u'' for key, value in self.extra.iteritems(): - result += '%s=`%s`' % (key, value) - result += '::%s' % msg + if isinstance(value, str): + value = value.decode('utf-8', 'replace') + result += u'%s=`%s`' % (key, value) + if isinstance(msg, str): + msg = msg.decode('utf-8', 'replace') + result += u'::%s' % msg return (result, kwargs)
gerrit-hooks has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
* #1276906::Update tracker: OK * #1260131::Update tracker: OK * Check Bug-Url::OK * Check Public Bug::#1276906::OK, public bug * Check Public Bug::#1260131::OK, public bug * Check Product::#1276906::OK, Correct product Red Hat Enterprise Virtualization Manager * Check Product::#1260131::OK, Correct product Red Hat Enterprise Virtualization Manager * Check TM::SKIP, not in a monitored branch (ovirt-3.6 ovirt-3.5 ovirt-3.4 ovirt-3.3 ovirt-3.2) * Check merged to previous::IGNORE, Not in stable branch (['ovirt-3.6', 'ovirt-3.5', 'ovirt-3.4', 'ovirt-3.3'])
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1: Code-Review-1
(5 comments)
https://gerrit.ovirt.org/#/c/48542/1/tests/utilsTests.py File tests/utilsTests.py:
Line 1084: # various codings. Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) This test is valid, if we don't control the extra parameters - but is this a real need? where is the code adding unicode parameters to logs? Can we encode them instead of doing extra work in the logging? Logging is too expensive without these changes. Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") This should be forbidden by the logging framework. There is no reason to use unicode format string, we have complete control over the format string in the source.
If we have to support unicode parameters to logging functions, we should encode them to utf-8 in the logging framework. Line 1092: log.debug(invalid)
https://gerrit.ovirt.org/#/c/48542/1/vdsm/logUtils.py File vdsm/logUtils.py:
Line 93: result = u'' Line 94: for key, value in self.extra.iteritems(): Line 95: if isinstance(value, str): Line 96: value = value.decode('utf-8', 'replace') Line 97: result += u'%s=`%s`' % (key, value) This decoding is not free. Line 98: if isinstance(msg, str): Line 99: msg = msg.decode('utf-8', 'replace') Line 100: result += u'::%s' % msg Line 101: return (result, kwargs)
Line 95: if isinstance(value, str): Line 96: value = value.decode('utf-8', 'replace') Line 97: result += u'%s=`%s`' % (key, value) Line 98: if isinstance(msg, str): Line 99: msg = msg.decode('utf-8', 'replace') So is this. Line 100: result += u'::%s' % msg Line 101: return (result, kwargs) Line 102: Line 103:
Line 96: value = value.decode('utf-8', 'replace') Line 97: result += u'%s=`%s`' % (key, value) Line 98: if isinstance(msg, str): Line 99: msg = msg.decode('utf-8', 'replace') Line 100: result += u'::%s' % msg Why return unicode string? There is no reason to do this, and encode it back to utf-8 later when emitting logs.
If we must handle unicode input, the right way is to encode the unicode values to utf-8 here, and return plain string. Line 101: return (result, kwargs) Line 102: Line 103: Line 104: class TracebackRepeatFilter(logging.Filter):
Milan Zamazal has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
(3 comments)
https://gerrit.ovirt.org/#/c/48542/1/tests/utilsTests.py File tests/utilsTests.py:
Line 1084: # various codings. Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid)
This test is valid, if we don't control the extra parameters - but is this
Who knows where unicode parameters come from... In this particular case it's self.conf['vmId'] in Vm. But you can never be sure that no unicode appears anytime around. Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer")
This should be forbidden by the logging framework. There is no reason to us
Complete control? In theory, yes. But how do you ensure it? If any single non-ASCII unicode passes and nothing handles it then we are in trouble.
As for conversions, I wasn't sure about the right direction, thanks for explanation. I can change it, i.e. converting unicodes to strings, in the common case it's just a couple of extra isinstance calls, which is negligible. And we'll be much safer. Line 1092: log.debug(invalid)
https://gerrit.ovirt.org/#/c/48542/1/vdsm/logUtils.py File vdsm/logUtils.py:
Line 96: value = value.decode('utf-8', 'replace') Line 97: result += u'%s=`%s`' % (key, value) Line 98: if isinstance(msg, str): Line 99: msg = msg.decode('utf-8', 'replace') Line 100: result += u'::%s' % msg
Why return unicode string? There is no reason to do this, and encode it bac
OK, I'll change the conversion direction. Line 101: return (result, kwargs) Line 102: Line 103: Line 104: class TracebackRepeatFilter(logging.Filter):
gerrit-hooks has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 2:
* #1276906::Update tracker: OK * #1260131::Update tracker: OK * Check Bug-Url::OK * Check Public Bug::#1276906::OK, public bug * Check Public Bug::#1260131::OK, public bug * Check Product::#1276906::OK, Correct product Red Hat Enterprise Virtualization Manager * Check Product::#1260131::OK, Correct product Red Hat Enterprise Virtualization Manager * Check TM::SKIP, not in a monitored branch (ovirt-3.6 ovirt-3.5 ovirt-3.4 ovirt-3.3 ovirt-3.2) * Check merged to previous::IGNORE, Not in stable branch (['ovirt-3.6', 'ovirt-3.5', 'ovirt-3.4', 'ovirt-3.3'])
Milan Zamazal has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
(1 comment)
We now check for unicodes and convert them to strings.
https://gerrit.ovirt.org/#/c/48542/1/vdsm/logUtils.py File vdsm/logUtils.py:
Line 96: value = value.decode('utf-8', 'replace') Line 97: result += u'%s=`%s`' % (key, value) Line 98: if isinstance(msg, str): Line 99: msg = msg.decode('utf-8', 'replace') Line 100: result += u'::%s' % msg
OK, I'll change the conversion direction.
Done Line 101: return (result, kwargs) Line 102: Line 103: Line 104: class TracebackRepeatFilter(logging.Filter):
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
(2 comments)
https://gerrit.ovirt.org/#/c/48542/1/tests/utilsTests.py File tests/utilsTests.py:
Line 1084: # various codings. Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid)
Who knows where unicode parameters come from... In this particular case it'
VmId is a uuid, cannot contain unicode.
We should validate input, not assume no control in the entire application. Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer")
Complete control? In theory, yes. But how do you ensure it? If any single n
We control whats in the source, for input, we should encode/decode values as needed.
There are 2 options with unicode support: - You work internally with plain strings - any unicode input is encode to utf8 - You work internally with unicode - any non-unicode input is decoded to unicode, and any output is encoded.
Since we don't need to work with unicode internally, there is no need to do this, and we should go with the first option. Line 1092: log.debug(invalid)
Milan Zamazal has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
(2 comments)
https://gerrit.ovirt.org/#/c/48542/1/tests/utilsTests.py File tests/utilsTests.py:
Line 1084: # various codings. Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid)
VmId is a uuid, cannot contain unicode.
Believe me or not, vm_id as passed to SimpleLogAdapter constructor in Vm.__init__ is a unicode instance (I didn't investigated why). It's not a problem per se, but it makes a unicode from the formatted string in SimpleLogAdapter.perform (thus the returned result is going to be unicode as well!) and when a non-ASCII message is logged, it crashes.
No objections against validating input, it's definitely a right thing to do, but a few additional checks don't harm, especially at a place where we've already been hit. Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer")
We control whats in the source, for input, we should encode/decode values a
I agree, and the check in SimpleLogAdapter conforms to that rule now. Line 1092: log.debug(invalid)
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 2: Code-Review-1
(10 comments)
https://gerrit.ovirt.org/#/c/48542/2//COMMIT_MSG Commit Message:
Line 8: Line 9: The values and messages passed to SimpleLogAdapter can be mixtures of Line 10: strings and unicodes containing various non-ASCII characters. If they Line 11: are not properly handled, concatenation of incompatible strings raises Line 12: UnicodeDecodeError. Please show the failure in the log - it is not clear which code lead to this error.
Looks like you are not fixing the real issue. Line 13: Line 14: We avoid that problem in SimpleLogAdapter by converting all the Line 15: processed texts to unicodes before joining them. We assume non-ASCII Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at
Line 11: are not properly handled, concatenation of incompatible strings raises Line 12: UnicodeDecodeError. Line 13: Line 14: We avoid that problem in SimpleLogAdapter by converting all the Line 15: processed texts to unicodes before joining them. We assume non-ASCII This is not correct any more, please update the description. Line 16: strings are in UTF-8 but if they are not then we shouldn't crash at Line 17: least. Line 18: Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7
Line 17: least. Line 18: Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Line 20: Bug-Url: https://bugzilla.redhat.com/1276906 Line 21: Bug-Url: https://bugzilla.redhat.com/1260131 These issues should be fixed in the engine side. Why should vdsm support input that is not supported by libvirt?
If vdsm get non-ascii string from engine using unknown encoding, it should decode the string on input, not delay the decoding to the last minute during logging. Please find the place when we get invalid input and do the decoding there.
https://gerrit.ovirt.org/#/c/48542/2/tests/utilsTests.py File tests/utilsTests.py:
Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) You should test this:
log = logUtils.SimpleLogAdapter(self._log, {"key": value}) msg, kwargs = log.process("msg", {"k": "v"})
self.assertEqual(msg, "key=%s::msg" % encoded_value) self.assertEqual(kwargs, {"k": "v"})
value and encoded_value should come from permutations list, failure to handle unicode will fail one test, but if ascii string is handled correct, its test will pass.
@permutations([ # value, encoded_value (unicode("\xd7\90", "utf8"), "\xd7\90"), ("ascii", "ascii"), # add more inputs if needed ]) def test_process(self, value, encoded_value): ... Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Again, this is invalid test. Format strings are part of the code and using non-ascci format strings is not allowed. Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Same Line 1092: log.debug(invalid)
Line 1088: e5=invalid, e6=u_invalid) Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Same
Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid) Line 1093: log.debug(u_invalid) Same
https://gerrit.ovirt.org/#/c/48542/2/vdsm/logUtils.py File vdsm/logUtils.py:
Line 92: def process(self, msg, kwargs): Line 93: result = '' Line 94: for key, value in self.extra.iteritems(): Line 95: if isinstance(value, unicode): Line 96: value = value.encode('utf-8', 'replace') No need to replace, all unicode code points can be encoded as utt8 (or any other "utf" encoding). See https://en.wikipedia.org/wiki/UTF-8
Searching for usage of extra dict, there is no code using non-ascii values:
$ git grep -A2 'SimpleLogAdapter(' vdsm/logUtils.py:class SimpleLogAdapter(logging.LoggerAdapter): vdsm/logUtils.py- # Because of how python implements the fact that warning vdsm/logUtils.py- # and warn are the same. I need to reimplement it here. :( -- vdsm/storage/resourceManager.py: self._log = SimpleLogAdapter(self._log, {"ResName": self.fullName, vdsm/storage/resourceManager.py- "ReqID": self.reqID}) vdsm/storage/resourceManager.py- -- vdsm/storage/resourceManager.py: self._log = SimpleLogAdapter(self._log, {"ResName": self.fullName, vdsm/storage/resourceManager.py- "ResRefID": resRefID}) vdsm/storage/resourceManager.py- -- vdsm/storage/task.py: self.log = SimpleLogAdapter(self.log, {"Task": self.id}) vdsm/storage/task.py- vdsm/storage/task.py- def __del__(self): -- vdsm/virt/vm.py: self.log = SimpleLogAdapter(self.log, {"vmId": self.conf['vmId']}) vdsm/virt/vm.py- self._destroyed = False vdsm/virt/vm.py- self._recoveryFile = constants.P_VDSM_RUN + \
So supporting unicode values in the extra dict is simply not needed with current code. Line 97: result += '%s=`%s`' % (key, value) Line 98: if isinstance(msg, unicode): Line 99: msg = msg.encode('utf-8', 'replace') Line 100: result += '::%s' % msg
Line 95: if isinstance(value, unicode): Line 96: value = value.encode('utf-8', 'replace') Line 97: result += '%s=`%s`' % (key, value) Line 98: if isinstance(msg, unicode): Line 99: msg = msg.encode('utf-8', 'replace') Please show a log example where logging fails because logging message contains non-ascii characters. Line 100: result += '::%s' % msg Line 101: return (result, kwargs) Line 102: Line 103:
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 1:
(1 comment)
https://gerrit.ovirt.org/#/c/48542/1/tests/utilsTests.py File tests/utilsTests.py:
Line 1084: # various codings. Line 1085: invalid = chr(254) + chr(0) Line 1086: u_invalid = unichr(253) + unichr(254) Line 1087: extra = dict(e1=u'Hello', e2='Здра́вствуйте', e3=u'שלום', e4='你好', Line 1088: e5=invalid, e6=u_invalid)
Believe me or not, vm_id as passed to SimpleLogAdapter constructor in Vm.__
vm id may is unicode string like this:
u"474ed6cc-e085-469c-b779-f9b036d96931"
This happens because we use json to pass the parameters, and json parse everything to unicode. This is however unicode instance containing ascii characters, and it does not need any special handling. The correct fix for this is to do:
self.log = SimpleLogAdapter(self.log, {"vmId": str(self.conf['vmId'])})
In vm.py.
So SimpleLogAdapter does not need to handle unicode in the extra dict. Line 1089: log = logUtils.SimpleLogAdapter(self._log, extra) Line 1090: log.debug("Dobrý večer") Line 1091: log.debug(u"Dobrý večer") Line 1092: log.debug(invalid)
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 2:
(1 comment)
https://gerrit.ovirt.org/#/c/48542/2//COMMIT_MSG Commit Message:
Line 17: least. Line 18: Line 19: Change-Id: I1930817ade1799b218913348ffa099aab2211ef7 Line 20: Bug-Url: https://bugzilla.redhat.com/1276906 Line 21: Bug-Url: https://bugzilla.redhat.com/1260131
These issues should be fixed in the engine side. Why should vdsm support in
This is a patch for master, it should contain only bug 1260131. The backport of this patch for ovirt-3.6 should contain only the 3.6 bug, and so on. Keeping multiple bugs when this actually attempt to fix one bug is wasting reviewers and maintainer time when they try to understand the problem.
The bug contains no vdsm log. You should request vdsm.log from the reporter or QE assignee, or include one from your reproduction.
Please explain the nature of the issue in the bug. The reason for the failure should be documented, so we can easily evaluate the proposed fix. See this bug for example: https://bugzilla.redhat.com/1279485
Nir Soffer has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 2:
I think that the correct fix is https://gerrit.ovirt.org/48565 But I cannot be sure since there are no logs in the bug.
gerrit-hooks has posted comments on this change.
Change subject: logging: Don't crash on non-ASCII in SimpleLogAdapter ......................................................................
Patch Set 3:
* Update tracker: IGNORE, no Bug-Url found * Check Bug-Url::WARN, no bug url found, make sure header matches 'Bug-Url: ' and is a valid url. * Check merged to previous::IGNORE, Not in stable branch (['ovirt-3.6', 'ovirt-3.5', 'ovirt-3.4', 'ovirt-3.3'])
vdsm-patches@lists.fedorahosted.org