Using libumberlog
by William Heinbockel
Okay,
I am sure I'm doing something wrong here, but I cannot get the
preloading to work for the latest libumberlog on Fedora 16.
Did the good ol' `make & make install` with no issues. Installed the
lib to `/usr/local/lib/libumberlog*`
Now I cannot figure out why the preload isn't working...
I run `ldd` on an syslog dependent exe, such as `logger`, and can
verify that the libumberjack and libjson are preloaded and being
linked.
However, all of the messages appearing in /var/log/messages use the
RFC3164 unstructured text, instead of being wrapped in the @cee json
structure.
Is there another env config or something I'm missing?
Thanks,
William Heinbockel
12 years, 2 months
field names and taxonomy
by Botond Botyanszki
Hi,
Finally the discussions started to focus on field names so I took the
courage to start a new thread on this.
There are some who think field names are not important as people/log
emitters will use whatever they see fit. I'm in the opposite camp and I do
think it is important to come up with a list of field names and to decide
whether we want to refer to the same thing as AccountName, LoginName or
UserName for example. This is a list which should consist of field names
more than the few which were discussed lately on the list (host, severity,
timestamp...)
We have been keeping an eye on the CEE effort a while back and there were
some field list and taxonomy drafts flying around which I found in the
CEE archives. But then came long silence and radical changes to these a
few months later.
Actually I even dared to say this in the docs:
"nxlog will try to use the Common Event Expression standard for the field
names once the standard is stable."
http://nxlog-ce.sourceforge.net/nxlog-docs/en/nxlog-reference-manual.html...
Anyway, there was no time to waste and after looking at other approaches
(CEF, XDAS etc) we came up with our own list of fields and event taxonomy.
Nxlog tries to be consistent with field names internally as much as
possible and this allows easy translation between some formats (e.g.
eventlog to syslog) without having to remap them manually in the config.
If there is interest, I could publish the list of fields (it is already
scattered in the nxlog docs and is also in the sources as *-fields.xml)
and also our taxonomy dictionary.
Note that I still have hopes for using a common standard defined by this
CEE/lumberjack effort for field names and taxonomy inside nxlog. If
other tools would adopt it, this would make a lot of people happy.
On the other hand if the intentions are to use the current cyptic names
(e.g. p_acct) as opposed to a more readable naming, then there is nothing
left but to use some optional translation (field renaming) to help
people deal with it.
If nothing happens, everybody will just keep inventing their own format
and naming schema.
Regards,
Botond
12 years, 2 months
New version of selog.h and observations
by Dmitri Pal
Hi,
1) I added more types. I think the more types the API supports out of
box the more convenient it is to the user. The whole point is that the
caller i.e. application developer would not need to convert the data or
cast it. May be we should eventually add some well known structures like
sockaddr for example as a type. It is always a pain to convert
addresses. Library can do it itself. But this can be added as we go. The
list of types is this header is good enough for starters.
2) I added arrays. Array can contain values of different types for the
same attribute. Keith, does XML schema with types take this into the
account?
3) Since we say we do not need LSON I removed it form the examples. An
interesting observation: it would not have worked the way I proposed it
with subtypes in names anyways because of the arrays. It would have
limited the arrays to single type array which is generally not the case
for example for proposed XML for the auditd.
4) Some thoughts about arrays in the event. How the arras are created?
Does the developer know all the values at the same time or builds array
gradually? How to reference array elements? What does it mean that there
are many KVPs with the same key and different values? Is it one key with
many values or one key with one value and the value should be overwritten?
Here is my take:
a) If the developer knows all values in advance he can just specify them
in one call like in the example on line 176. For example in the auditd
case it will be convenient as the interpreted and raw values are known
at the same time.
b) The data is always added to the event and never modified unless the
key is explicitly deleted. This means that the event will logically
treat each key as an array of 1 and would be able to add another value
if the KVP with the same key is specified again. This would allow
building part of the event in a loop.
c) Events are generally not editable - they are added to but in some
cases it might be beneficial to drop some KVP from already existing set.
This can be done by specifying a special value SELOG_DEL_ATTR. See lines
48 and 179. If the key has more than one value the whole array is deleted.
d) There is no need to provide a way to access elements of the array as
this would create too much of the complexity for a corner case. Rather
than that I would suggest (if people ask) add a way to delete a special
element from the already existing KVP array. This can be done by yet
another special value SELOG_DEL_ATTRELEM, N when N is the index. This
however can be added later if needed.
If you agree with my approach to arrays I can start on the implementation.
I really can do an implementation of this interface and produce the
results pretty quickly. Any objections?
But I can't start though unless I hear that it really makes sense to do.
We can definitely reiterate and polish but I can't afford doing the work
and then throwing it away, sorry.
--
Thank you,
Dmitri Pal
Sr. Engineering Manager IPA project,
Red Hat Inc.
-------------------------------
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/
12 years, 2 months
connection to syslog (or equivalent)
by David Lang
I posted some of this at the bottom of a long message yesterday that
nobody replied to, so I broke it out to a separate thread
The traditional syslog interface has been a write-only interface, with the
only feedback being that the write could block.
With the new formats and new syslog replacement library, does there need
to be some rethinking of this interface?
JSON can be pushed through the traditional syslog interface with no
changes (which is one of the attractions of using JSON), but when we
support additional formats, especially ones with type information, we need
to have a way to tell what format to use when writing the log.
This could be done by configuring things in the openlog call, but that
requries changes to every application.
I think it would be better to have the serialization format be determined
automatically by the library if possible.
One way to do this is to have a handshake.
When a app opens /dev/log, the syslog server sends a capabilities message.
The client can ignore this message and send traditional formatted syslog
data to the server
Or, the client can read the capabilities message and then pick which
supported format (potentially with compression options) to use to send
data to the server.
This approach lets the format be anything that the library supports, under
the control of the sysadmin of the system (by configuring what
capabilities the syslog daemon advertises) without the application
programmer needing to know anything about these formats.
thoughts?
David Lang
12 years, 2 months
wrap-up of a night of traffic
by Rainer Gerhards
Hi folks,
I thought I spare us another set of 10 to 20 individual replies and try to
wrap up the discussion as far as I am concerned. This is *my* *personal*
opinion and take at the situation as well as my plan (which always can be
altered by good arguments).
- evolve vs. revolve
I still think it would be most useful to get applications to emit real
structured data (not just system-obtained metadata) and submit this via the
already-defined cee-enhanced syslog way to the "system event router" (aka
"syslogd" ;)). This would be imperfect but a big win. Umberlog supports it
and has great potential to promoting that way of doing things. Helping to
make this happen and fully support it is my top priority.
- JSON vs. XML
If XML carries type information and JSON not, simply saying "we can convert
between the two" IMHO is dangerous. Lossy transformation is always very
problematic. I think we should come to a common understanding if types are
needed or not. If they are, the need to be supported by JSON as well (or JSON
abandoned, what I would not recommend). If the consensus is that lossy
transformation is OK, I would not object it. I'd also support the
implementation, albeit with lower priority (just because of my limited
resources, not to turn s/b off;)). I admit that I do not have a clear
position if type is required or not, I see good and bad in both cases. But I
dislike different representations if they are semantically different (what
they are if one has type info and the other not).
- new APIs
Umberlog definitely is on the right path, especially if we manage to merge it
into glibc. Dmitri's efforts sound very reasonable and useful to me. I have
yet to decide if I will continue to work on libee. Maybe I'll better join
forces with Dmitri. That would have the nice benefit that liblognorm would
integrate with it, being a prime vehicle to turn semi-text logs into
structured logs (even without the syslogd!).
- new syslog interface
Bidirectional. While I think it is too early, I would definitely support it.
As far as it has been described on list, seems fairly easy to do. With lib
support, could even be pretty transparent for apps. If we take this path,
let's start with the local protocol. Let not also consider a new network
protocol right in the beginning. Maybe that's just based on my position.
- Implementing...
Probably just my personal position: it is totally impossible for me to do
everything that was proposed in parallel. I simply do not have enough time
for that. So it may be good if we could at least come to a common
understanding of priorities.
I guess these are the main points ;)
Rainer
12 years, 2 months
Re: [lumberjack] how many steps to do -
by Rainer Gerhards
+1
Well said :-)
Rainer
Gergely Nagy <algernon(a)balabit.hu> hat geschrieben:"Rainer Gerhards" <rgerhards(a)hq.adiscon.com> writes:
>> With the current plan we ask application developers to change things
>> twice. First to new syslog umberlog interface and then later to a
>> better interface. IMO this is wrong. I tried to rise the concerns about
>> this several times but have not been heard.
>> IMO we need to define one interface that the application developers
>> would use and start to migrate to and then evolve the infrastructure
>> under it.
>>
>> We should take selog or like, polish it warp around ul_log or pure
>> syslog and have as an interface. While developers would migrate to it
>> we will evolve the library and syslog implementation under it to
>> serialize and format things into JSON XML etc but developer will have
>> to change things once.
>
> This definitely is a valid argument, but I think we need to weigh the pros
> and cons. Too much change in a single instant is very hard to sell. To small
> steps don't make sense, either.
>
> My personal opinion (and experience) is that smaller steps work better than
> larger ones. There are obviously people in the absolute opposite camp. I
> still think that evolution works better than revolution.
There's one more thing I would like to add, regarding the discussion
about handshake and how to change/replace/extend/whatever /dev/log: why
care?
I don't think there must be a single interface to get logs from one
application to another, nor should we care about the transportation
issue, unless in context of libc (but more on that later, in another
thread).
Why? Because there's no single size that fits all. We want to be able to
*work with* logs, so we need a format, or representation that makes
sense, is easy to work with, and is pretty much a standard. How it
arrives from the producer to the consumer, is none of our business, in
my opinion.
Like Rainer, I'm not against anyone trying to go down this route, but
that's a route that I, personally, do not wish to tread.
--
|8]
_______________________________________________
lumberjack-developers mailing list
lumberjack-developers(a)lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/lumberjack-developers
12 years, 2 months
Syslog/Lumberjack compatibility spec
by Heinbockel, Bill
>
>Come close to what I am an thinking/doing. But I'd like to see some kind of
>a
>spec. Maybe I am just overdoing that part ;) But I think we need a very
>easy
>way for folks to actually use the new stuff. Probably means we should talk
>to
>OS consumers. But, again, that's easier with a small spec. Are we mature
>enough for that?
>
+1
We need some working documentation of these discussions. I have a hard time keeping track of them.
Rainer/Gergely: Can you put this together or at least send me some thoughts?
I would like to get this posted to the lumberjack wiki as a reference.
Also, we should do our best to separate the needs of the syslog protocol from the syslog message format.
I see priority/facility and most of the header information necessary for the syslog protocol, but not at all for the message format. If similar fields are provided in the message, my suggestion would be to treat them as being associated with the event. The syslog header fields should be kept as part of the syslog protocol (not the event message).
12 years, 2 months
Value Types in event logs (Re: syslog-like API for structured messages)
by William Heinbockel
Splitting this conversation into a separate thread...
While I agree that being able to transmit type information with logs
is a noble goal, there are many nuances, especially across JSON and
XML.
JSON handles a few basic types well, namely string, int, double,
boolean, but will require additional work to support other types, such
as datetime.
We need to determine if this is worth addressing. Seeing the the most
popular format will probably be JSON over Syslog, we will lose the
type information if it is not made available.
XML has more flexibility with typing, but only in combination with XML
Schema. This means that you either have to define all of the field
names a priori in XML Schema, or define a minimal schema that binds
type information to predefined type elements.
For example, in order to support this
`<Event><dst_ip>1.2.3.4</dst_ip></Event>`
I need to have a related XML Schema that defines dst_ip has a type of
IPv4 Address (otherwise it will be treated as a string or ducktyped
into an IPv4 address)
`<Event><dst_ip type="ipv4">1.2.3.4</dst_ip></Event>`
This poses similar issues. XML Schema cannot validate the @type
attribute based on the dst_ip value (though this is fairly trivial to
do with XSLT or similar). You also have the issue of what if dst_ip is
defined as an xs:int in the schema but @type is "ipv4", which value
type wins.
Also, this approach works will for atomic types, but does not work as
well if it is a structure and contains child elements.
For the best compatibility with XML Schema:
`<Event><ipv4 name="dst_ip">1.2.3.4</dst_ip></Event>`
This works better for XML Schema validation. But is not as natural to
use as the former examples.
I have no problem with either of the above solutions. After some
thought, option #2 might be the best, but we need to figure out how to
handle/represent structures and make this representable with XML
Schema. As I mention above, this is fairly trivial for atomic types,
but I don't know how to do it.
On Wed, Mar 21, 2012 at 4:12 PM, Botond Botyanszki <boti(a)nxlog.org> wrote:
> On Wed, 21 Mar 2012 14:15:47 -0400
> William Heinbockel <wheinbockel(a)gmail.com> wrote:
>
>> On Wed, Mar 21, 2012 at 2:12 PM, Dmitri Pal <dpal(a)redhat.com> wrote:
>> > On 03/20/2012 12:00 PM, david(a)lang.hm wrote:
>> >> On Tue, 20 Mar 2012, Gergely Nagy wrote:
>> >>
>> >>> david(a)lang.hm writes:
>> >>>
>> >>>> I think that we are going to need a type system before long.
>> >>>
>> >>> Yeah, but not in JSON, where it would be bolted upon.
>> >>
>> >> That's reasonable. It just means we need to support more than just
>> >> JSON soon :-)
>> >
>> > Type system of JSON is good enough. I might be a good compromise between
>> > no types and everything has a schema.
> I'd call it 'better than nothing'. There are some types lacking, most
> notably the DateTime type, which are mostly essential in our case.
>
>> +1
>> While I have nothing against explicit typing, I don't see the need.
> -1
> If you only think about forwarding and storing text (based logs),
> probably there is no need for that. But once you need to analyze the data
> where you compare and sort values, knowing the type of the value is pretty
> much required.
>
>> I would like to have some way to align the JSON structures with XML
>> representations, though. The only real issue here is the mapping of
>> JSON arrays to a similar XML structure.
> I think mapping arrays is pretty straightforward:
> JSON:
> { "addr":["1.2.3.4","2.3.4.5"] }
> XML:
> <event>
> <addr>1.2.3.4</addr>
> <addr>2.3.4.5</addr>
> </event>
> The problem here is mapping the type information what we discussed
> earlier an mostly agreed that squeezing it into JSON gets a little ugly.
>
Yep
12 years, 2 months
Proposed plan
by Dmitri Pal
Hi,
So where we are and what are the next steps?
1) There is a question about glibc and integration of the ul library
into it. I will take an action item to investigate.
2) I sent a header to review. Please review and provide feedback. Is it
the right API? It is reasonable or this is completely off? The file is
attached yet again.
3) Are we satisfied with the latest XML spec Keith published? Please
review and ack/nack.
4) Do we need to have a call exposed from the syslog implementation that
would tell the library a preferred encoding (JSON, BSON, XML etc.). Let
us have a thread about it. Rainer what is your take on this?
5) Is there anything else that we need to discuss or review?
6) Do we need another call?
Sorry for jumping in. I just need to get organized myself to better plan
my time and creating a list and plan helps a lot.
--
Thank you,
Dmitri Pal
Sr. Engineering Manager IPA project,
Red Hat Inc.
-------------------------------
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/
12 years, 2 months