On 02/25/2010 11:23 AM, Rod Montgomery wrote:
I personally have been working under ticket #239. I have not looked
at
#255, but on its face it looks like a duplicate of #239.
The system that made me interested in Suds in the first place is being
rebuilt (not by me) and does not currently have a circularly-importing
set of WSDL files. As the rebuilding proceeds, I expect that the
circular-importation problem will re-emerge, because the builder is
using Microsoft (tm) tools, and those tools apparently love to create
circularly-importing sets.
There are several other examples under ticket #239.
I agree with what you say about the perversity of circularly-importing
WSDL files, but I don't have control over the WSDL files I need to use.
I can't speak for the originator of this particular message thread, but
I'm not trying to provoke you (jortel) to work on this. You have your
priorities, which are not necessarily mine, but even more important, I'd
like to contribute something to Suds as partial repayment to you for the
work you've done in building Suds in the first place.
That said, I'd prefer not to make a contribution that breaks what's
already there or that corrupts the structure of what's already there. My
initial attempt to contribute towards a solution of ticket #239 was
apparently unacceptable in that respect, so I'm trying to do better this
time. That's why I'm bothering you with questions.
Alas, I'm afraid I don't understand the answer to my question.
I still don't see how caching intermediate results helps very much, if
at all: I see caching the raw xml files, or their parsed forms, to
reduce network traffic,
The goal for caching xml files isn't (wasn't) to reduce network traffic but to
mitigate
the high latency of hitting the servers for what is mostly static information. It is very
costly in terms of time get these documents from the server in most cases.
but unless you're going to make a substantial
investment in machinery to figure out what parts of a network of
WSDL
files have become stale and need to be reloaded, it looks to me like you
pretty much have to reload everything. I don't see that investment as
worthwhile: the biggest payoff, as I see it, is from caching the
Definitions structures for the Class instances, so the user doesn't have
to rebuild those structures every time he wants to make a new client for
a particular service.
Agreed.
If anything, I'd expect the net payoff for caching intermediate-level
Definitions structures to be *negative*, *especially* in complex cases,
because such caching bloats the cache, increasing the memory footprint
of the application process.
Can be. That is why the caching still needs some work.
I don't understand the statement that the WSDL-loading process -- or
perhaps the process for complex sets of WSDL files? -- "needs to be
optimized in 0.4". Or maybe I misunderstand what "optimized" means in
this context: usually, I think, when someone says "optimized", he means
"runs fast". Maybe the meaning here is not that.
Yes, you misunderstood (actually, I was unclear). "needs to be optimized in
0.4" was
referring to optimizing the caching of intermediate XML files and WSDL object structures.
The way I see it, the most important thing is that Suds be able to
handle all valid sets of WSDL files. If it doesn't work at all, who
cares how fast it is?
Agreed.
The next most important thing is for Suds to perform well in handling
transactions, given that the Client definition for the service has
already been loaded.
Agreed.
This has two aspects. For applications that make many transactions using
a single Client object, the overwhelmingly important performance is the
performance of the transactions themselves. For some applications,
however, there may also be much setting up and tearing down of
individual Client objects for each service; for such applications, the
performance of that setting up and tearing down -- having the
Definitions structure already in hand -- is also important.
Agreed.
Only third -- and, as I see it, a distant third -- do we reach
performing well in loading the Client definition for a service -- that
is, in building the Definitions structure. Spending development effort
on optimizing that process by caching to eliminate unnecessary network
traffic strikes me as worthwhile, especially when several services all
import the same common WSDL file(s). Spending development effort on
reducing the amount of processor time needed to build the Definitions
structure, given the xml or its parsed form in cache, though, strikes me
as a waste. If you have to complexify the code to reduce that processor
time, and that complexity makes it harder, or impossible, to handle all
valid WSDL file sets, it's an even bigger waste. And if that complexity
somehow got built anyway, I'd personally be inclined to un-build it, if
only to decrease the volume of code within which bugs can breed in the
future.
Ah, agreed but not sure where you're getting this from.
In my reply, I discussed reworking a bit of the WSDL load/merge code to handle circular
imports properly. That's all.
Jeff Ortel wrote, On 02/25/2010 11:05 AM:
> When evolving the caching, I left the intermediate caching in place so
> that when a top level object needed to be rebuilt, it could potentially
> be done with some cached content rather then downloading all of it. The
> intent was that if the wsdl Definitions object graph expired in the
> cache, some/most of the raw xml files could be read from cache. That
> said, in cases where wsdls are written as a web of many wsdl fragments
> which are imported into each other and the top wsdl, this approach can
> be costly. Most wsdls are written with a sane usage of import. Anyway,
> this needs to be optimized in 0.4.
>
> Now, as for your circular wsdl import issue. I took great care in
> handling these cases in the XSD package (for schemas). Import/Include
> scenarios can be extremely complex. Especially when taking namespaces
> into account and reference resolution. I can apply the same load/merge
> algorithm to the wsdl loading but it will take some effort. The reason,
> the wsdl loading is simple in suds is that wsdl import should be linear.
> Except for sadistic pleasure, I can't think of a valid reason to have
> mutual imports in wsdls. I don't mean to imply that your problem isn't
> valid ;) Or, that I don't intend to help. Just giving some background.
>
> I guess what I'm saying is that mutual imports in wsdls is an edge case
> that I am glad to address but has not been a priority. It will require
> (as you pointed out) a significant overhaul of the way wsdls are loaded
> & merged. if done properly, I think the caching will not interfere.
>
> I'll have to see if I can scope this for 0.4. I'll need access to a good
> test case like yours. Sorry that I don't know off the top of my head: Do
> you have a ticket on this? I see ticket #255. This related to you?
>
>
>
> On 02/25/2010 08:00 AM, Rod Montgomery wrote:
>> Suds is caching each WSDL file at least twice, in file reader.py:
>>
>> Class DocumentReader caches the raw sax.parse tree
>>
>> Class DefinitionsReader caches the class Definitions (from file
>> wsdl.py) data structure.
>>
>> If there are imports among WSDL files, even more copies get cached.
>>
>> For example, A imports B imports C -- then the cache will contain
>>
>> raw sax.parse for C
>>
>> Definitions for C
>>
>> raw sax.parse for B
>>
>> Definitions for B with C imported and resolved
>>
>> raw sax.parse for A
>>
>> Definitions for A with B and C imported and resolved
>>
>> If I eliminate the caching of the Definitions below the top level, so
>> that only the raw sax.parse trees get cached, what will
>> break?
>>
>> The reason I want to do this is to simplify the code, to make it
>> easier to modify it to handle WSDL files that have circular
>> importing relationships, such as A imports B and B imports A.
>>
>> Assuming none of the WSDL files changes, the most important thing to
>> cache is the Definitions structure for the top-level
>> WSDL files. That is what the Client class needs to create a new Client
>> instance for a particular service.
>>
>> The only time caching the Definitions structures for WSDL files below
>> top level improves performance is when the same
>> lower-level WSDL file gets imported by several top-level WSDL files.
>>
>> If one of the WSDL files changes, you really need to invalidate the
>> whole cache anyway, or you'll risk using stale copies of
>> Definitions structures for WSDL files that import the changed file.