Implementation-filled fields in https://fedorahosted.org/lumberjack/wiki/FieldList

List overview All Threads
Download

newer

older

question about field names

lumberjack cookie

Miloslav Trmač

8 Oct 2012 8 Oct '12

6:50 p.m.

Hello, when discussing a lumberjack-enabling patch set, it was pointed out that [ugp]id fields should be filled based on SCM_RIGHTS - and the Fedora configuration we are considering actually does that.

Both rsyslog's imuxsock and libumberlog add the following fields: * pid * uid * gid (and each adds some other fields that the other does not).

Is there a consensus to mark these three as "to be filled by log implementation, not by applications? If so, I'll go ahead and edit the wiki.

Thank you, Mirek

Show replies by date

William Heinbockel

8 Oct 8 Oct

7:44 p.m.

Mirek,

I am on board with that opinion. I view fields such as pid, uid, and gid to be platform specific and should be filled out by the logger. Applications should only report the application-specific information.

Now, a more interesting question: what should happen if the application (incorrectly) attempts to fill out value the log implementation is responsible for?

While this might be an implementation dependent decision, I believe the recommendation should be that the log implementation either drops or renames the application's value in preference of its own.

On Mon, 2012-10-08 at 13:50 -0400, Miloslav Trmac wrote:

...

Hello, when discussing a lumberjack-enabling patch set, it was pointed out that [ugp]id fields should be filled based on SCM_RIGHTS - and the Fedora configuration we are considering actually does that.

Both rsyslog's imuxsock and libumberlog add the following fields:

pid

uid

gid

(and each adds some other fields that the other does not).

Is there a consensus to mark these three as "to be filled by log implementation, not by applications? If so, I'll go ahead and edit the wiki.

Thank you, Mirek _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

David Lang

10:15 p.m.

This is the sort of thing I was talking about a few weeks ago when I was proposing having a 'trusted' subtree.

These fields are probably not the only ones that we will end up wanting to have the logging software be the only software that fills them. Flagging each individual field for special treatment is going to be more pain over time than tagging a subtree (each new field that needs to be protected will require updates to all logging software to prevent users from setting it vs not existing until the logging software is updated to produce it)

I can easily see where you would want to know if an application logs these things.

In addition, these fields are all ones that an application may be logging today, but not referring to itself (the pid could be of a process it spawns, uid/gid could be referring to a user that is logging in, etc)

David Lang

On Mon, 8 Oct 2012, William Heinbockel wrote:

...

Mirek,

I am on board with that opinion. I view fields such as pid, uid, and gid to be platform specific and should be filled out by the logger. Applications should only report the application-specific information.

Now, a more interesting question: what should happen if the application (incorrectly) attempts to fill out value the log implementation is responsible for?

While this might be an implementation dependent decision, I believe the recommendation should be that the log implementation either drops or renames the application's value in preference of its own.

On Mon, 2012-10-08 at 13:50 -0400, Miloslav Trmac wrote:

...
Hello, when discussing a lumberjack-enabling patch set, it was pointed out that [ugp]id fields should be filled based on SCM_RIGHTS - and the Fedora configuration we are considering actually does that.

Both rsyslog's imuxsock and libumberlog add the following fields:

pid

uid

gid

(and each adds some other fields that the other does not).

Is there a consensus to mark these three as "to be filled by log implementation, not by applications? If so, I'll go ahead and edit the wiki.

Thank you, Mirek _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

Dmitri Pal

9 Oct 9 Oct

12:22 a.m.

On 10/08/2012 05:15 PM, david@lang.hm wrote:

...

This is the sort of thing I was talking about a few weeks ago when I was proposing having a 'trusted' subtree.

These fields are probably not the only ones that we will end up wanting to have the logging software be the only software that fills them. Flagging each individual field for special treatment is going to be more pain over time than tagging a subtree (each new field that needs to be protected will require updates to all logging software to prevent users from setting it vs not existing until the logging software is updated to produce it)

I can easily see where you would want to know if an application logs these things.

In addition, these fields are all ones that an application may be logging today, but not referring to itself (the pid could be of a process it spawns, uid/gid could be referring to a user that is logging in, etc)

David Lang

Then may be the fields that are generated by the interface should have a prefix that by convention is not used for other fields. For example instead of: pid - resolved_pid or something like. And IMO attempt to explicitly overwrite such field in the call should return an error right away. If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

...

On Mon, 8 Oct 2012, William Heinbockel wrote:

...
Mirek,

I am on board with that opinion. I view fields such as pid, uid, and gid to be platform specific and should be filled out by the logger. Applications should only report the application-specific information.

Now, a more interesting question: what should happen if the application (incorrectly) attempts to fill out value the log implementation is responsible for?

While this might be an implementation dependent decision, I believe the recommendation should be that the log implementation either drops or renames the application's value in preference of its own.

On Mon, 2012-10-08 at 13:50 -0400, Miloslav Trmac wrote:

...
Hello, when discussing a lumberjack-enabling patch set, it was pointed out that [ugp]id fields should be filled based on SCM_RIGHTS - and the Fedora configuration we are considering actually does that.

Both rsyslog's imuxsock and libumberlog add the following fields:

pid

uid

gid

(and each adds some other fields that the other does not).

Is there a consensus to mark these three as "to be filled by log implementation, not by applications? If so, I'll go ahead and edit the wiki.

Thank you, Mirek _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

-- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. ------------------------------- Looking to carve out IT costs? www.redhat.com/carveoutcosts/

David Lang

12:33 a.m.

On Mon, 8 Oct 2012, Dmitri Pal wrote:

...

On 10/08/2012 05:15 PM, david@lang.hm wrote:

...
This is the sort of thing I was talking about a few weeks ago when I was proposing having a 'trusted' subtree.

These fields are probably not the only ones that we will end up wanting to have the logging software be the only software that fills them. Flagging each individual field for special treatment is going to be more pain over time than tagging a subtree (each new field that needs to be protected will require updates to all logging software to prevent users from setting it vs not existing until the logging software is updated to produce it)

I can easily see where you would want to know if an application logs these things.

In addition, these fields are all ones that an application may be logging today, but not referring to itself (the pid could be of a process it spawns, uid/gid could be referring to a user that is logging in, etc)

David Lang

Then may be the fields that are generated by the interface should have a prefix that by convention is not used for other fields. For example instead of: pid - resolved_pid or something like. And IMO attempt to explicitly overwrite such field in the call should return an error right away.

exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...

If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

David Lang

...

...
On Mon, 8 Oct 2012, William Heinbockel wrote:

...
Mirek,

I am on board with that opinion. I view fields such as pid, uid, and gid to be platform specific and should be filled out by the logger. Applications should only report the application-specific information.

Now, a more interesting question: what should happen if the application (incorrectly) attempts to fill out value the log implementation is responsible for?

While this might be an implementation dependent decision, I believe the recommendation should be that the log implementation either drops or renames the application's value in preference of its own.

On Mon, 2012-10-08 at 13:50 -0400, Miloslav Trmac wrote:

...
Hello, when discussing a lumberjack-enabling patch set, it was pointed out that [ugp]id fields should be filled based on SCM_RIGHTS - and the Fedora configuration we are considering actually does that.

Both rsyslog's imuxsock and libumberlog add the following fields:

pid

uid

gid

(and each adds some other fields that the other does not).

Is there a consensus to mark these three as "to be filled by log implementation, not by applications? If so, I'll go ahead and edit the wiki.

Thank you, Mirek _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

-- Thank you, Dmitri Pal

Sr. Engineering Manager for IdM portfolio Red Hat Inc.

Looking to carve out IT costs? www.redhat.com/carveoutcosts/

lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

David Lang

12:51 a.m.

On Mon, 8 Oct 2012, david@lang.hm wrote:

...

exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...
If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

using the terminology from the wiki page listed in the subject, these would be 'objects' instead of 'subtrees' and the character at the end would be a '!' instead of a '.'

David Lang

Rainer Gerhards

1:31 p.m.

...

using the terminology from the wiki page listed in the subject, these would be 'objects' instead of 'subtrees' and the character at the end would be a '!' instead of a '.'

Wasn't '!' intended to specify subtrees? I think when I last asked, that was the case. I know I switched from "." to "!" in rsyslog after asking...

Rainer

David Lang

6:04 p.m.

On Tue, 9 Oct 2012, Rainer Gerhards wrote:

...

...
using the terminology from the wiki page listed in the subject, these would be 'objects' instead of 'subtrees' and the character at the end would be a '!' instead of a '.'

Wasn't '!' intended to specify subtrees? I think when I last asked, that was the case. I know I switched from "." to "!" in rsyslog after asking...

Yes, that's why I was suggesting that we use it as the prefix delimiter.

people who want to treat everything as flat can just use it as a prefix.

people who want to use trees will have it fall out into trees per the spec.

David Lang

William Heinbockel

2:12 p.m.

On Mon, 2012-10-08 at 16:51 -0700, david@lang.hm wrote:

...

On Mon, 8 Oct 2012, david@lang.hm wrote:

...
exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...
If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

Okay. I see where this is going. David, sorry for the prior discussion of the "trusted" fields. I now understand the point you were trying to make.

Generally, there are several groupings of fields: * Application - the fields & structures added by the application when the event record is created. * Log system - the "trusted" fields added by the log service and system * Other fields added by relays and event consumers

If we look at this, there is a natural nesting that occurs. First the application data is records, with whatever fields they wish. Then, the data is wrapped by the log system, similar to the syslog header vs. content, though we probably want to be more flexible in the "header" fields. I think this is some of what David was explaining with his "trusted" fields. They are not "trusted" from the point of security, but the fact that they are placed there by a more trusted service and can be thought of as being more reliable.

Later additions can then wrap the original events.

The only problem with this approach is that the most used information from the original event ends up buried within this nesting of Matryoshkas.

Maybe there is another way to solve this problem using a flat(ter) structure?

...

using the terminology from the wiki page listed in the subject, these would be 'objects' instead of 'subtrees' and the character at the end would be a '!' instead of a '.'

David Lang _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

Miloslav Trmač

2:20 p.m.

----- Original Message -----

...

If we look at this, there is a natural nesting that occurs. First the application data is records, with whatever fields they wish. Then, the data is wrapped by the log system, similar to the syslog header vs. content, though we probably want to be more flexible in the "header" fields. I think this is some of what David was explaining with his "trusted" fields. They are not "trusted" from the point of security, but the fact that they are placed there by a more trusted service and can be thought of as being more reliable.

Later additions can then wrap the original events.

The only problem with this approach is that the most used information from the original event ends up buried within this nesting of Matryoshkas.

Could we not start reevaluating the very core of the design, and perhaps (I know, I want a lot...) just agree on the solution that has already been implemented, even if suboptimal?

While we shake our heads at CEE changing things, we discuss exactly the same thing here....

(I'm quite willing to prepare patches to Fedora-relevant components to get the field names changed to whatever the consensus is - once more. A second change would make me very grumpy.) Mirek

Rainer Gerhards

2:32 p.m.

...

Could we not start reevaluating the very core of the design, and perhaps (I know, I want a lot...) just agree on the solution that has already been implemented, even if suboptimal?

While we shake our heads at CEE changing things, we discuss exactly the same thing here....

I think the prefix is probably worth considering, maybe system_* being kind of protected. On the other hand, I could also live with the current field list and protect it. The bottom line is that with a prefix, we can protect future fields whereas we cannot with the field list (for obvious reasons ;)).

I am ready to make this change. I also think we should stick with the rest of the implementation and let it evolve in the future (except maybe for the cookie, for reasons outlined in other mail).

In any case, if we want protection, we need to define what this protection is. A simple approach (aka "quickly to patch") is to disallow overwrites to the protected set AND give the user the capability to override that.

Rainer

...

(I'm quite willing to prepare patches to Fedora-relevant components to get the field names changed to whatever the consensus is - once more. A second change would make me very grumpy.) Mirek _______________________________________________ lumberjack-developers mailing list lumberjack-developers@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/lumberjack-developers

David Lang

6:18 p.m.

On Tue, 9 Oct 2012, Miloslav Trmac wrote:

...

----- Original Message -----

...
If we look at this, there is a natural nesting that occurs. First the application data is records, with whatever fields they wish. Then, the data is wrapped by the log system, similar to the syslog header vs. content, though we probably want to be more flexible in the "header" fields. I think this is some of what David was explaining with his "trusted" fields. They are not "trusted" from the point of security, but the fact that they are placed there by a more trusted service and can be thought of as being more reliable.

Later additions can then wrap the original events.

The only problem with this approach is that the most used information from the original event ends up buried within this nesting of Matryoshkas.

Could we not start reevaluating the very core of the design, and perhaps (I know, I want a lot...) just agree on the solution that has already been implemented, even if suboptimal?

While we shake our heads at CEE changing things, we discuss exactly the same thing here....

(I'm quite willing to prepare patches to Fedora-relevant components to get the field names changed to whatever the consensus is - once more. A second change would make me very grumpy.)

Once this is out in the real world, we're going to have to live with it for a long time. This is about our last chance to fix anything. We don't want to make major changes, but I do think we need some way to protect some fields, and I think a reserved prefix/subtree is far better than trying to protect individual fields.

David Lang

6:02 p.m.

On Tue, 9 Oct 2012, William Heinbockel wrote:

...

On Mon, 2012-10-08 at 16:51 -0700, david@lang.hm wrote:

...
On Mon, 8 Oct 2012, david@lang.hm wrote:

...
exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...
If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

Okay. I see where this is going. David, sorry for the prior discussion of the "trusted" fields. I now understand the point you were trying to make.

Generally, there are several groupings of fields:

Application - the fields & structures added by the application when

the event record is created.

Log system - the "trusted" fields added by the log service and system

Other fields added by relays and event consumers

If we look at this, there is a natural nesting that occurs. First the application data is records, with whatever fields they wish. Then, the data is wrapped by the log system, similar to the syslog header vs. content, though we probably want to be more flexible in the "header" fields. I think this is some of what David was explaining with his "trusted" fields. They are not "trusted" from the point of security, but the fact that they are placed there by a more trusted service and can be thought of as being more reliable.

Later additions can then wrap the original events.

The only problem with this approach is that the most used information from the original event ends up buried within this nesting of Matryoshkas.

Maybe there is another way to solve this problem using a flat(ter) structure?

It's only the 'trusted' stuff that needs 'protection', and even that only needs to be altered if it's bounced from machine to machine in a way that isn't trusted by the admin.

for example, if you have trusted!pid and you send it via a trusted mechanism, you don't change trusted!pid, you just add trusted!authreason='TLS transport' trusted!relay!identity='key'

so it doesn't hae to get as nasty as you are thinking. additional data is added, not nested.

David Lang

Rainer Gerhards

1:38 p.m.

...

...
Then may be the fields that are generated by the interface should

have a

...
prefix that by convention is not used for other fields. For example instead of: pid - resolved_pid or something like. And IMO attempt to explicitly overwrite such field in the call should return an error

right

...
away.

exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...
If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

I strongly like the prefix idea. It makes it easy to protect these fields.

HOWEVER, this raises another question. How far will we go protecting them? This issue was also raised in the last discussion, but no solution was found.

The problem is with relaying. Let's say we have

App-A->local log socket-B->syslogd-C->remote syslogd

Where in C we have a network transfer. I assume that the syslogd will take "trusted" data from the OS at point B and at the same time can prohibit forged data from the app. HOWEVER, what happens at the remote syslogd after the network transfer in C? Do we permit it to trust the data contained in the network package - or what do we need to do?

IMHO, the solution is to accept that data, but recommend that the syslogd (any logger meant) implementation will use different policies, e.g. trusted data may not be accepted via regular transfer, but via TLS-protected transfer with mutual authentication (or at least sender authentication).

Rainer

David Lang

5:59 p.m.

On Tue, 9 Oct 2012, Rainer Gerhards wrote:

...

...
...
Then may be the fields that are generated by the interface should

have a

...
prefix that by convention is not used for other fields. For example instead of: pid - resolved_pid or something like. And IMO attempt to explicitly overwrite such field in the call should return an error

right

...
away.

exactly. If we support hierarchical structures, we can use a single tag, if we make everything flat, we can reserve a prefix (ideally, a prefix that translates to a subtree name so that people who want it flat can treat it as if it's flat, people who want hierarchical can treat it as a subtree)

I would actually reserve two prefixes/subtree names

The first for the tags that are generated by the logging infrastructure

The second for a place to relocate any tags that are passed to us in a reserved space.

for the sake of argument, call these 'trusted.' and 'forged.' If someone submits something with trusted.uid, move it to forged.trusted.uid. If someone submits something with forged.uid move it to forged.forged.uid.

...
If the prefix is unique there is a very low chance that the resolved fields would be overwritten unintentionally.

exactly.

Now, from the discussion a couple of weeks ago, people have a real hard time with 'trusted' (arguments crop up about how much you can really trust it), so we need to pick something else.

we could do lumberjack. for trusted and beaver. for forged :-) This is a bit long, someone with more creativity can pick a couple of names.

I strongly like the prefix idea. It makes it easy to protect these fields.

HOWEVER, this raises another question. How far will we go protecting them? This issue was also raised in the last discussion, but no solution was found.

The problem is with relaying. Let's say we have

App-A->local log socket-B->syslogd-C->remote syslogd

Where in C we have a network transfer. I assume that the syslogd will take "trusted" data from the OS at point B and at the same time can prohibit forged data from the app. HOWEVER, what happens at the remote syslogd after the network transfer in C? Do we permit it to trust the data contained in the network package - or what do we need to do?

IMHO, the solution is to accept that data, but recommend that the syslogd (any logger meant) implementation will use different policies, e.g. trusted data may not be accepted via regular transfer, but via TLS-protected transfer with mutual authentication (or at least sender authentication).

I think that is going to be up to the sysadmin to determine how trusted the data is.

what you are suggesting is a reasonable default.

David Lang

4280

Age (days ago)

4281

Last active (days ago)

lumberjack-developers@lists.fedorahosted.org

14 comments

5 participants

tags (0)

participants (5)

David Lang
Dmitri Pal
Miloslav Trmač
Rainer Gerhards
William Heinbockel