Christoph Höger wrote:
I am currently trying to add incremental parsing (TM) to a java based
LR(n) parser. To prove that it works, I have the following test setup:
For every given test input file the following steps are run:
- parse that file normally
- for every token (well, for large files every 100th token or so) in
that file, add a test case which assumes that token has changed and runs
the incremental parser with that information
This gives me for a really large file some 100 test cases and two
_really_ strange things.
1. Memory growth from test case to test case: As I add tests dynamically
I run a single testsuite which creates all test cases, so I cannot fork
a new vm for every test case. That should not make any difference,
because in tearDown() I set all fields of the test case to null and
manually invoke the garbage collector. Still the heap memory growth
about 1 or 2 MB with every test case run, resulting in huge performance
holes when the heap runs full and finally a HeapOverflowException
- Remember: I set _all_ fields of the test cases to null after the test
was run -
2. And this is where the WTF-o-meter goes through the ceiling: On one
test the non incremental parser takes 300ms on a 18K input file
containing ~ 2800 tokens (which is a nice performance, I think). When I
run my incremental parser with the very first token marked as changed
it's finished after only 30ms. That would be a very good result if the
incremental parser implementation was finished yet, but it is not!
Currently that two invocations run exactly the same code path! So I see
a reduction to 10% in runtime only by invoking the same code path again.
If I would encounter both phenomenons one at a time I would blame (or
praise in case of 2.) openjdks jvm for it, but having both leaves me
with the guess of some kind of code caching mechanism in the background.
Is that some kind of hot feature? And how (for the sake of benchmarking)
can I deactivate it?
1. I think, the NetBeans Profiler  would help you to obtain objective
info about described problems.
2. Note, loaded classes of tests won't be collected by GC until their
Class Loader is in-use.
3. The Parsing API  may be also interesting for you.