Category Archives: Indigo

Posts related to Indigo and Web Services

Performance Characteristics of WCF Encoders

As part of the Framework, we ship 3 MessageEncoders (accessible through the relevate subclass of MessageEncodingBindingElement):

  1. Text – The “classic” web services encoder. Uses a Text-based (UTF-8 by default) XML encoding. This is the default encoder used by BasicHttpBinding and WsHttpBinding
  2. MTOM – An interoperable format (though less broadly supported then text) that allows for a more optimized transmission of binary blobs, as they don’t get base64 encoded.
  3. Binary –  A WCF-specific format that avoids base64 encoding your binary blobs, and also uses a dictionary-based algorithm to avoid data duplication. Binary supports “Session Encoders” that get smarter about data usage over the course of the session (through pattern recognition). This is the default encoder used by NetTcpBinding and NetNamedPipeBinding

I often get asked “which encoder is the fastest?” (and then “by how much?” :)). As always, the first principle of performance is to measure and tune your exact scenarios to determine if this is a bottleneck for you. That being said, here are some notes on the performance characteristics of our built-in Message Encoders.

Broadly speaking, encoders can impact your performance along two axis: size of encoded messages, and CPU load required to generate/consume those encoded messages.

In general, binary has the fastest encoding/decoding speed since it has less to do (usually because there is less data to read/write). This has to do with the dictionary-based optimization characteristics. The speedup is greater over TCP/NamedPipes since the encoder can recognize patterns (and negotiate optimizations) over the course of the session. If both participants are using WCF, then binary is a natural choice for production. (Note that during development, Text may be useful for debugging purposes).

Both binary and MTOM yield much faster processing of binary data (by avoiding the base64 process as well as the associated size bloat). Binary achieves this with inline binary blobs. The MTOM format achieves this through an inline base64 stub that references the binary blob outside of the Infoset. In both cases, the user model is abstracted from this detail and they will “appear” inline through the encoder.

If you do not have any binary data involved, MTOM will actually be slower than text since it had the extra overlead of packaging and processing the Message within a MIME document. However, if there is enough binary data in the document then the savings from avoiding base64 encoding can make up for this added overhead.

We spent a lot of engineering effort tuning the performance of our UTF-8 Text encoder, so you will see better performance over UTF-8 then the Unicode variations. And as to whether you should use Text or MTOM for interoperable endpoints, the guidance above should help with gut feel, but please measure your scenarios!

Behind the protected BindingElement ctor

On our BindingElement class, there is a protected ctor that takes another BindingElement as its parameter. This constructor exists in order to facilitate a composable implementation of BindingElement.Clone.  When writing a custom binding element, first implement a protected copy constructor as follows (note that for sealed classes this ctor should be private):

protected MyBindingElement(MyBindingElement elementToBeCloned)
   : base(elementToBeCloned)
{
 
// copy all fields from elementToBeCloned.XXX to this.XXX
}

Then you should implement your Clone() method as follows:

public override BindingElement Clone()
{
  return new MyBindingElement(this);
}

Any BindingElement in your inheritance chain (assuming it has followed this pattern) will then copy over the relevant values in its copy constructor, so that you can be assured a full Clone of your custom binding element.

Auto-open and multi-thread usage of client channels

Buddhike hit a hiccup the other day with a multi-threaded client that bears explanation.

The Channel layer always requires an explicit Open() before it can be used. This enforces our CommunicationObject state machine. As a usability feature, our ServiceChannel proxy code supports “auto-open”. That is, you can call a proxy method without explicitly calling Open and the runtime will call Open() on your behalf. This is transparent in the case where you are using a proxy synchronously from a single thread. However, if you are using a proxy asynchronously (or from multiple threads), you may have the case that the Open() is associated with the first request, but subsequent requests are also pending.

Since the state machine is that Open() must complete before Send/Receive are valid operations, none of the requests can proceed until Open completes. In the shipping code, this synchronization is actually around the entire ServiceChannel call, and so Buddhike was seeing an excessive delay. We’ll investigate for the next version if there’s a way in unblock earlier on the client, while still providing all of our existing behavioral guarantees. In the interim, I recommend two things when using a client asynchronously and/or from multiple threads concurrently:

  1. Open your client explicitly prior to usage. You can do this sychronously or asynchronously depending on your application
  2. Prefer calling your client asynchronously to spinning up multiple threads for synchronous calls if you want better scalability/thread-usage

Signalling "End Of Session"

When authoring a session-ful channel, it’s important to signal “end of session” correctly so that the runtime (or any other user of the channel) knows when to stop reading messages, and to start shutting down his side of the conversation (with CloseOutputSession and/or channel.Close). A null Message/RequestContext signals end-of-session to the caller. In particular, depending on your channel shape, you should do the following:

  • IInputSessionChannel/IDuplexSessionChannel: Return null from channel.Receive(). Correspondingly, return true from TryReceive with the “message” out-param set to null. And of course, cover your bases by having BeginTryReceive complete synchronously with a signal to return true + message = null from EndTryReceive.
  • IRequestSessionChannel: Return null from channel.ReceiveRequest(). Correspondingly, return true from TryReceiveRequest with the “context” out-param set to null. Lastly, have BeginTryReceiveRequest complete synchronously with a signal to return true + context = null from EndTryReceiveRequest.

Throttling in WCF

When your server is hosted out on the “big bad internet”, you need a way to make sure that you don’t get flooded with client requests. In WCF, our services support throttling as a way of mitigating potential DoS (denial of service) attacks. These throttles can also help you smooth load on your server and help enforce resource allocations. There are three service-level throttles that are controlled by ServiceThrottlingBehavior. These are in addition to any transport-specific throttles imposed by your binding. To fully understand the impact of these throttles you should also understand the threading/instancing characteristics of your service.

  1. MaxConcurrentCalls bounds the total number of simultaneous calls that we will process (default == 16). This is the only normalized throttle we have across all of the outstanding reads that the ServiceModel Dispatcher will perform on any channels it accepts. Each call corresponds to a Message received from the top of the server-side channel stack. If you set this high then you are saying that you have the resources to handle that many calls simultaneously. In practice how many calls will come in also depends on your ConcurrencyMode and InstancingMode.
  2. MaxConcurrentSessions bounds the total number of sessionful channels that we will accept (default == 10). When we hit this throttle then new channels will not be accepted/opened. Note that this throttle is effectively disabled for non-sessionful channels (such as default BasicHttpBinding).

    With TCP and Pipes, we don’t ack the preamble until channel.Open() time. So if you see clients timing out waiting for a “preamble response”, then it’s possible that the target server has reached this throttle. By default your clients will wait a full minute (our default SendTimeout), and then time out with a busy server. Your stack will look something like:

    TestFailed System.TimeoutException: The open operation did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.
    […]
    at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreamble(IConnection connection, ArraySegment`1 preamble, TimeoutHelper& timeoutHelper)

    If instead you are timing out under channel.Send (rather than channel.Open), then it’s possible that you are hitting the MaxConcurrentCalls throttle (which kicks in per-message, not per-channel).

  3. MaxConcurrentInstances bounds the total number of instances created. This throttle provides added protection in the case that you have an instance lifetime that is not tied to a call or a session (in which case it would already be bounded by the other two throttles). Orcas durable services are one such scenario.

Net-net: if you are testing your services under load, and your clients start timing out, take a look at your throttling and instancing values. On the flip side, do not just blindly set these to int.MaxValue without fully understanding the potential DoS consequences.

InstanceContextMode, ConcurrencyMode, and Server-side Threading

When trying to write a scalable web service, you need to be aware of a few properties that affect how the WCF runtime will dispatch requests to your service: InstanceContextMode and ConcurrencyMode. In a nutshell, InstanceContextMode controls when a new instance of your service type is created, and ConcurrencyMode controls how many requests can be serviced simultaneously. The default InstanceContextMode is InstanceContextMode.PerSession, and the default ConcurrencyMode is ConcurrencyMode.Single.

Others have covered the details of these two knobs in detail, you can check them out for more background. Here I’m simply going to explain the affect these settings can have on your threading behavior.

If you have set ConcurrencyMode == ConcurrencyMode.Single, then you don’t have to worry about your Service instances being free-threaded (unless you are doing concurrent code within your methods). The only time multiple calls are allowed is when there are multiple instances. For InstanceContextMode.Singleton, you will get one method call at a time since you only have a single instance. For InstanceContextMode.PerCall or PerSession, ServiceModel will spin up extra threads up to a throttle in order to handle extra requests. There’s one possibly unexpected twist here. That is, when using a session-ful binding there will only be one outstanding instance call per-channel. Even with InstanceContextMode.PerCall. This is because WCF strictly maintains the in-order delivery guarantees of the channel with ConcurrencyMode.Single. So when using a session-ful binding (i.e. the default NetTcpBinding or NetNamedPipeBinding) + ConcurrencyMode.Single, InstanceContextMode.PerCall and InstanceContextMode.PerSession will behave the exact same way from a server-side threading/throttling perspective with regards to a single channel.

ConcurrencyMode == ConcurrencyMode.Reentrant is similar, except you can trigger another call to your instance from within your service (let’s say you call into a second service who calls back into you before it returns).

When you have ConcurrencyMode == ConcurrencyMode.Multiple, threading comes heavily into play. WCF will call into your instance on multiple threads unless you are using InstanceContextMode.PerCall. Throttles again will come into play (that’s a topic for another post).

To summarize, here are some basic scenarios for what happens when 100 clients simultaneously hit a service method:

Scenario 1: InstanceContextMode.Single+ConcurrencyMode.Single
Result: 100 sequential invocations of the service method on one thread

Scenario 2: InstanceContextMode.Single+ConcurrencyMode.Multiple
Result: N concurrent invocations of the service method on N threads, where N is determined by the service throttle.

Scenario 3:InstanceContextMode.PerCall+Any ConcurrencyMode
Result: N concurrent invocations of the method on N service instances, where N is determined by the service throttle

More details on MaxConnections

Here are some more “under the hood” details in response to the following question:

How is one supposed to interpret the NetTcpBinding.MaxConnections property on the client? My assumption has been that setting this property at the client only allows this number of concurrent connections, and further connection attempts will be queued until an existing connection is released.

MaxConnections for TCP is not a hard and fast limit, but rather a knob on the connections that we will cache in our connection pool. That is, if you set MaxConnections=2, you can still open 4 client channels on the same factory simultaneously. However, when you close all of these channels, we will only keep two of these connections around (subject to IdleTimeout of course) for future channel usage. This helps performance in cases where you are creating and disposing client channels. This knob will also apply to the equivalent usage on the server-side as well (that is, when a server-side channel is closed, if we have less than MaxConnections in our server-side pool we will initiate I/O to look for another new client channel).

The reason that we don’t have a hard and fast limit on your connection usage is that you already can control the connection usage through your usage of the WCF objects. That is, if you don’t want to use more than two connections, don’t create more than two client channels πŸ™‚ Any additional knobs at the lower layer would only impede debuggability and predictability.

Note that MaxConnections applies across channels. When sending messages over a single channel, you can only send out one message at a time. That is, your second Send() call will not be initiated until your first Send() completes. In this manner, our TCP binding can guarantee in-order delivery. Also, practically speaking there would be a significant amount of complexity (and overall a negative performance hit) if we allowed interleaving of data from multiple messages, as each “chunk” would need to be annotated with a scatter/gather message marker.

Lastly, all of the above comments apply to TransferMode.Buffered (the default). When using Streaming mode, we “check out” a connection for each in-progress send (and not per-channel). So all the above statements will apply to simultaneous sends rather than simultaneous channels. Streaming TCP is a datagram (not a session-ful) channel, and so simultaneous sends are supported since each send will use a separate TCP connection. This is more similar to HTTP’s usage of TCP connections (where each in-flight request-response pair is using a separate TCP connection).

Client (TCP and Named Pipe) Connection Pooling

Using the TCP and Named Pipe bindings give you a very clean mapping between IDuplexSessionChannel and the underlying network resource (socket or pipe). Namely, you can effectively treat a channel as 1-1 to a socket (I will use socket as shorthand for the generic “network resource” for the remainder of this post :)).

That being said, the lifetime of the underlying socket is not necessarily 1-1 with the lifetime of the channel. Due to our connection pooling feature in WCF, a connection can be reused over the lifetime of multiple channels. We perform connection pooling for both buffered and streaming channels. Our connection pool is configurable through TcpConnectionPoolSettings/NamedPipeConnectionPoolSettings. These settings include a GroupName that we use for isolation, an upper bound on our cache size (MaxOutboundConnectionsPerEndpoint), and timeout values for reliability and NLB support

The way connection pooling works on the client is as follows:

  • When you open a channel we will first look for a connection in our pool. This lookup is performed based on IP+port for sockets and based on endpoint Uri name for Pipes.
  • If we find an available connection in our pool then we will attempt our open handshake using .Net Framing. If this succeeds then we will associate the connection with the new channel and return from Open. If it fails then we’ll discard the connection. If we have not yet exceeded the binding’s OpenTimeout then we will repeat the “look in pool” process.
  • If no [valid] connections are found in our pool then we will establish a new connection (again, using up to the time remaining in OpenTimeout).
  • When you close a channel, after we perform our close handshake we will consider returning the connection to our pool. If we already have reached MaxOutboundConnectionsPerEndpoint, or the connection’s lifetime has exceeded LeaseTimeout then we will close the connection instead. The connection that is returned to the pool is the “raw” connection (the one that was initially accepted, prior to any security upgrades). In this way we can provide a transparent pool without leaking any security or other information.

I was going to cover the server-side usage of connection pooling in this same post, but the process of accepting and reusing connections on the server is worthy of its own topic next time πŸ˜‰