March 2012 - lumberjack-developers - Fedora Mailing-Lists

by William Heinbockel

Okay, I am sure I'm doing something wrong here, but I cannot get the preloading to work for the latest libumberlog on Fedora 16. Did the good ol' `make & make install` with no issues. Installed the lib to `/usr/local/lib/libumberlog*` Now I cannot figure out why the preload isn't working... I run `ldd` on an syslog dependent exe, such as `logger`, and can verify that the libumberjack and libjson are preloaded and being linked. However, all of the messages appearing in /var/log/messages use the RFC3164 unstructured text, instead of being wrapped in the @cee json structure. Is there another env config or something I'm missing? Thanks, William Heinbockel

12 years, 2 months

4
10
0 / 0

field names and taxonomy

by Botond Botyanszki

Hi, Finally the discussions started to focus on field names so I took the courage to start a new thread on this. There are some who think field names are not important as people/log emitters will use whatever they see fit. I'm in the opposite camp and I do think it is important to come up with a list of field names and to decide whether we want to refer to the same thing as AccountName, LoginName or UserName for example. This is a list which should consist of field names more than the few which were discussed lately on the list (host, severity, timestamp...) We have been keeping an eye on the CEE effort a while back and there were some field list and taxonomy drafts flying around which I found in the CEE archives. But then came long silence and radical changes to these a few months later. Actually I even dared to say this in the docs: "nxlog will try to use the Common Event Expression standard for the field names once the standard is stable." http://nxlog-ce.sourceforge.net/nxlog-docs/en/nxlog-reference-manual.html... Anyway, there was no time to waste and after looking at other approaches (CEF, XDAS etc) we came up with our own list of fields and event taxonomy. Nxlog tries to be consistent with field names internally as much as possible and this allows easy translation between some formats (e.g. eventlog to syslog) without having to remap them manually in the config. If there is interest, I could publish the list of fields (it is already scattered in the nxlog docs and is also in the sources as *-fields.xml) and also our taxonomy dictionary. Note that I still have hopes for using a common standard defined by this CEE/lumberjack effort for field names and taxonomy inside nxlog. If other tools would adopt it, this would make a lot of people happy. On the other hand if the intentions are to use the current cyptic names (e.g. p_acct) as opposed to a more readable naming, then there is nothing left but to use some optional translation (field renaming) to help people deal with it. If nothing happens, everybody will just keep inventing their own format and naming schema. Regards, Botond

12 years, 2 months

7
23
0 / 0

New version of selog.h and observations

by Dmitri Pal

Hi, 1) I added more types. I think the more types the API supports out of box the more convenient it is to the user. The whole point is that the caller i.e. application developer would not need to convert the data or cast it. May be we should eventually add some well known structures like sockaddr for example as a type. It is always a pain to convert addresses. Library can do it itself. But this can be added as we go. The list of types is this header is good enough for starters. 2) I added arrays. Array can contain values of different types for the same attribute. Keith, does XML schema with types take this into the account? 3) Since we say we do not need LSON I removed it form the examples. An interesting observation: it would not have worked the way I proposed it with subtypes in names anyways because of the arrays. It would have limited the arrays to single type array which is generally not the case for example for proposed XML for the auditd. 4) Some thoughts about arrays in the event. How the arras are created? Does the developer know all the values at the same time or builds array gradually? How to reference array elements? What does it mean that there are many KVPs with the same key and different values? Is it one key with many values or one key with one value and the value should be overwritten? Here is my take: a) If the developer knows all values in advance he can just specify them in one call like in the example on line 176. For example in the auditd case it will be convenient as the interpreted and raw values are known at the same time. b) The data is always added to the event and never modified unless the key is explicitly deleted. This means that the event will logically treat each key as an array of 1 and would be able to add another value if the KVP with the same key is specified again. This would allow building part of the event in a loop. c) Events are generally not editable - they are added to but in some cases it might be beneficial to drop some KVP from already existing set. This can be done by specifying a special value SELOG_DEL_ATTR. See lines 48 and 179. If the key has more than one value the whole array is deleted. d) There is no need to provide a way to access elements of the array as this would create too much of the complexity for a corner case. Rather than that I would suggest (if people ask) add a way to delete a special element from the already existing KVP array. This can be done by yet another special value SELOG_DEL_ATTRELEM, N when N is the index. This however can be added later if needed. If you agree with my approach to arrays I can start on the implementation. I really can do an implementation of this interface and produce the results pretty quickly. Any objections? But I can't start though unless I hear that it really makes sense to do. We can definitely reiterate and polish but I can't afford doing the work and then throwing it away, sorry. -- Thank you, Dmitri Pal Sr. Engineering Manager IPA project, Red Hat Inc. ------------------------------- Looking to carve out IT costs? www.redhat.com/carveoutcosts/

12 years, 2 months

4
12
0 / 0

connection to syslog (or equivalent)

by David Lang

I posted some of this at the bottom of a long message yesterday that nobody replied to, so I broke it out to a separate thread The traditional syslog interface has been a write-only interface, with the only feedback being that the write could block. With the new formats and new syslog replacement library, does there need to be some rethinking of this interface? JSON can be pushed through the traditional syslog interface with no changes (which is one of the attractions of using JSON), but when we support additional formats, especially ones with type information, we need to have a way to tell what format to use when writing the log. This could be done by configuring things in the openlog call, but that requries changes to every application. I think it would be better to have the serialization format be determined automatically by the library if possible. One way to do this is to have a handshake. When a app opens /dev/log, the syslog server sends a capabilities message. The client can ignore this message and send traditional formatted syslog data to the server Or, the client can read the capabilities message and then pick which supported format (potentially with compression options) to use to send data to the server. This approach lets the format be anything that the library supports, under the control of the sysadmin of the system (by configuring what capabilities the syslog daemon advertises) without the application programmer needing to know anything about these formats. thoughts? David Lang

12 years, 2 months

7
69
0 / 0

wrap-up of a night of traffic

by Rainer Gerhards

Hi folks, I thought I spare us another set of 10 to 20 individual replies and try to wrap up the discussion as far as I am concerned. This is *my* *personal* opinion and take at the situation as well as my plan (which always can be altered by good arguments). - evolve vs. revolve I still think it would be most useful to get applications to emit real structured data (not just system-obtained metadata) and submit this via the already-defined cee-enhanced syslog way to the "system event router" (aka "syslogd" ;)). This would be imperfect but a big win. Umberlog supports it and has great potential to promoting that way of doing things. Helping to make this happen and fully support it is my top priority. - JSON vs. XML If XML carries type information and JSON not, simply saying "we can convert between the two" IMHO is dangerous. Lossy transformation is always very problematic. I think we should come to a common understanding if types are needed or not. If they are, the need to be supported by JSON as well (or JSON abandoned, what I would not recommend). If the consensus is that lossy transformation is OK, I would not object it. I'd also support the implementation, albeit with lower priority (just because of my limited resources, not to turn s/b off;)). I admit that I do not have a clear position if type is required or not, I see good and bad in both cases. But I dislike different representations if they are semantically different (what they are if one has type info and the other not). - new APIs Umberlog definitely is on the right path, especially if we manage to merge it into glibc. Dmitri's efforts sound very reasonable and useful to me. I have yet to decide if I will continue to work on libee. Maybe I'll better join forces with Dmitri. That would have the nice benefit that liblognorm would integrate with it, being a prime vehicle to turn semi-text logs into structured logs (even without the syslogd!). - new syslog interface Bidirectional. While I think it is too early, I would definitely support it. As far as it has been described on list, seems fairly easy to do. With lib support, could even be pretty transparent for apps. If we take this path, let's start with the local protocol. Let not also consider a new network protocol right in the beginning. Maybe that's just based on my position. - Implementing... Probably just my personal position: it is totally impossible for me to do everything that was proposed in parallel. I simply do not have enough time for that. So it may be good if we could at least come to a common understanding of priorities. I guess these are the main points ;) Rainer

12 years, 2 months

7
32
0 / 0

Re: [lumberjack] how many steps to do -

by Rainer Gerhards

+1 Well said :-) Rainer Gergely Nagy <algernon(a)balabit.hu> hat geschrieben:"Rainer Gerhards" <rgerhards(a)hq.adiscon.com> writes: >> With the current plan we ask application developers to change things >> twice. First to new syslog umberlog interface and then later to a >> better interface. IMO this is wrong. I tried to rise the concerns about >> this several times but have not been heard. >> IMO we need to define one interface that the application developers >> would use and start to migrate to and then evolve the infrastructure >> under it. >> >> We should take selog or like, polish it warp around ul_log or pure >> syslog and have as an interface. While developers would migrate to it >> we will evolve the library and syslog implementation under it to >> serialize and format things into JSON XML etc but developer will have >> to change things once. > > This definitely is a valid argument, but I think we need to weigh the pros > and cons. Too much change in a single instant is very hard to sell. To small > steps don't make sense, either. > > My personal opinion (and experience) is that smaller steps work better than > larger ones. There are obviously people in the absolute opposite camp. I > still think that evolution works better than revolution. There's one more thing I would like to add, regarding the discussion about handshake and how to change/replace/extend/whatever /dev/log: why care? I don't think there must be a single interface to get logs from one application to another, nor should we care about the transportation issue, unless in context of libc (but more on that later, in another thread). Why? Because there's no single size that fits all. We want to be able to *work with* logs, so we need a format, or representation that makes sense, is easy to work with, and is pretty much a standard. How it arrives from the producer to the consumer, is none of our business, in my opinion. Like Rainer, I'm not against anyone trying to go down this route, but that's a route that I, personally, do not wish to tread. -- |8] _______________________________________________ lumberjack-developers mailing list lumberjack-developers(a)lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/lumberjack-developers

12 years, 2 months

7
23
0 / 0

umberlog & glibc

by Rainer Gerhards

Hi all, I just read https://plus.google.com/u/0/101384639386588513837/posts/Pba8zssFP3z In the light of this, may it be worth re-considering trying to move the new style API into glibc? Rainer

12 years, 2 months

9
24
0 / 0

Syslog/Lumberjack compatibility spec

by Heinbockel, Bill

> >Come close to what I am an thinking/doing. But I'd like to see some kind of >a >spec. Maybe I am just overdoing that part ;) But I think we need a very >easy >way for folks to actually use the new stuff. Probably means we should talk >to >OS consumers. But, again, that's easier with a small spec. Are we mature >enough for that? > +1 We need some working documentation of these discussions. I have a hard time keeping track of them. Rainer/Gergely: Can you put this together or at least send me some thoughts? I would like to get this posted to the lumberjack wiki as a reference. Also, we should do our best to separate the needs of the syslog protocol from the syslog message format. I see priority/facility and most of the header information necessary for the syslog protocol, but not at all for the message format. If similar fields are provided in the message, my suggestion would be to treat them as being associated with the event. The syslog header fields should be kept as part of the syslog protocol (not the event message).

12 years, 2 months

4
3
0 / 0

Value Types in event logs (Re: syslog-like API for structured messages)

by William Heinbockel

Splitting this conversation into a separate thread... While I agree that being able to transmit type information with logs is a noble goal, there are many nuances, especially across JSON and XML. JSON handles a few basic types well, namely string, int, double, boolean, but will require additional work to support other types, such as datetime. We need to determine if this is worth addressing. Seeing the the most popular format will probably be JSON over Syslog, we will lose the type information if it is not made available. XML has more flexibility with typing, but only in combination with XML Schema. This means that you either have to define all of the field names a priori in XML Schema, or define a minimal schema that binds type information to predefined type elements. For example, in order to support this `<Event><dst_ip>1.2.3.4</dst_ip></Event>` I need to have a related XML Schema that defines dst_ip has a type of IPv4 Address (otherwise it will be treated as a string or ducktyped into an IPv4 address) `<Event><dst_ip type="ipv4">1.2.3.4</dst_ip></Event>` This poses similar issues. XML Schema cannot validate the @type attribute based on the dst_ip value (though this is fairly trivial to do with XSLT or similar). You also have the issue of what if dst_ip is defined as an xs:int in the schema but @type is "ipv4", which value type wins. Also, this approach works will for atomic types, but does not work as well if it is a structure and contains child elements. For the best compatibility with XML Schema: `<Event><ipv4 name="dst_ip">1.2.3.4</dst_ip></Event>` This works better for XML Schema validation. But is not as natural to use as the former examples. I have no problem with either of the above solutions. After some thought, option #2 might be the best, but we need to figure out how to handle/represent structures and make this representable with XML Schema. As I mention above, this is fairly trivial for atomic types, but I don't know how to do it. On Wed, Mar 21, 2012 at 4:12 PM, Botond Botyanszki <boti(a)nxlog.org> wrote: > On Wed, 21 Mar 2012 14:15:47 -0400 > William Heinbockel <wheinbockel(a)gmail.com> wrote: > >> On Wed, Mar 21, 2012 at 2:12 PM, Dmitri Pal <dpal(a)redhat.com> wrote: >> > On 03/20/2012 12:00 PM, david(a)lang.hm wrote: >> >> On Tue, 20 Mar 2012, Gergely Nagy wrote: >> >> >> >>> david(a)lang.hm writes: >> >>> >> >>>> I think that we are going to need a type system before long. >> >>> >> >>> Yeah, but not in JSON, where it would be bolted upon. >> >> >> >> That's reasonable. It just means we need to support more than just >> >> JSON soon :-) >> > >> > Type system of JSON is good enough. I might be a good compromise between >> > no types and everything has a schema. > I'd call it 'better than nothing'. There are some types lacking, most > notably the DateTime type, which are mostly essential in our case. > >> +1 >> While I have nothing against explicit typing, I don't see the need. > -1 > If you only think about forwarding and storing text (based logs), > probably there is no need for that. But once you need to analyze the data > where you compare and sort values, knowing the type of the value is pretty > much required. > >> I would like to have some way to align the JSON structures with XML >> representations, though. The only real issue here is the mapping of >> JSON arrays to a similar XML structure. > I think mapping arrays is pretty straightforward: > JSON: > { "addr":["1.2.3.4","2.3.4.5"] } > XML: > <event> > <addr>1.2.3.4</addr> > <addr>2.3.4.5</addr> > </event> > The problem here is mapping the type information what we discussed > earlier an mostly agreed that squeezing it into JSON gets a little ugly. > Yep

12 years, 2 months

8
83
0 / 0

Proposed plan

by Dmitri Pal

Hi, So where we are and what are the next steps? 1) There is a question about glibc and integration of the ul library into it. I will take an action item to investigate. 2) I sent a header to review. Please review and provide feedback. Is it the right API? It is reasonable or this is completely off? The file is attached yet again. 3) Are we satisfied with the latest XML spec Keith published? Please review and ack/nack. 4) Do we need to have a call exposed from the syslog implementation that would tell the library a preferred encoding (JSON, BSON, XML etc.). Let us have a thread about it. Rainer what is your take on this? 5) Is there anything else that we need to discuss or review? 6) Do we need another call? Sorry for jumping in. I just need to get organized myself to better plan my time and creating a list and plan helps a lot. -- Thank you, Dmitri Pal Sr. Engineering Manager IPA project, Red Hat Inc. ------------------------------- Looking to carve out IT costs? www.redhat.com/carveoutcosts/

12 years, 2 months

3
3
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

lumberjack-developers March 2012