I’ve gotten some questions recently about how our net.tcp transport functions at the memory allocation level. While at its core this is simply an “implementation detail”, this information can be very useful for tuning high performance routers and servers depending on your network topology.
When writing a service on top of net.tcp, there are a few layers where buffer allocations and copies can occur:
- Buffers passed into socket.Receive()
- Buffers created by the Message class (to support Message copying, etc)
- Buffers created by the Serialization stack (to convert from Message to/from parameters)
In addition, this behavior differs based on your TransferMode (Buffered or Streamed). In general you’ll find that Buffered mode will provide you the highest performance for “small messages” (i.e. < 4MB or so), while Streamed will provide better performance for larger messages (i.e. > 4MB). Usually this switch is combined with a tweak of your MaxBufferPoolSize in order to avoid memory thrashing.
TransferMode.Buffered
In Buffered mode, the transport will reuse a fixed-size buffer on its calls to socket.Receive(). The ConnectionBufferSize setting on TcpTransportBindingElement controls the size of this buffer.
The data read from the wire is incrementally parsed at the .Net Message Framing level, and is copied into a new buffer (allocated by the BufferManager) that is sized to the length of the message. This step always involves a buffer copy, but rarely involves a new allocation for small messages (since BufferManager will cache and recycle previously used buffers).
The message-sized buffer is then passed to the MessageEncoder for Message construction. Again, minimal allocations should occur here since we will pool XmlReader/Writer instances where possible. At this point you have a Message that is backed by a fixed-size buffer and we will share that backing buffer for all copies of the Message created through message.CreateBufferedCopy().
If your contract is Message/Message, then no more deserialization will occur. If you are using Stream parameters, then the allocations are determined by your code (since you are creating byte[]s to pass to stream.Read), and given the nature of the Stream APIs you will entail a buffer copy as well (since we have to fill the caller buffer with data stored in the callee).
If your contract uses CLR object parameters, then the Serializer will process the Message and generate CLR objects as per your Data Contracts. This step often entails a number of allocations (as you would expect since you are generating a completely new set of constructs).
TransferMode.Streamed
There are three main differences that occur when you are using TransferMode.Streamed:
- The transport will use new variable-sized byte[]s in its calls to socket.Receive. These byte[]s are still allocated from the configured BufferManager. The transport will then offer up a scatter/gather-based stream to the MessageEncoder so there are no extra buffer allocations or copies in this case.
- The Message created from the Encoder will be a “Streamed” message. This means that if you simply Read() from the Message’s Body in a single-shot then we don’t allocate any extra memory. However, if you call message.CreateBufferedCopy() we will entail some large allocations since we need to fully buffer at that point. Note that some binding elements (such as ReliableSession and certain Security configurations) will call CreateBufferedCopy() under the hood and trigger this situation.
- Since the backing data may not all be in memory, invoking the Serializer will incur a larger cost since it will be pulling all the data into memory at once (in order to generate the appropriate CLR objects, even if that is simply a byte[]).
As you can see, there is a tension between performance and usability. Hopefully this data will help you make the appropriate tradeoffs.
Using NetTcpBinding, StreamedResponse, no signing, no encryption, security mode none, I have measured that WCF itself allocates a total of approximate 116 KB of managed memory server side for every stream, that is, the overhead is independent of how many bytes are transferred through the stream, and the overhead is big…
Can you please look at this issue?
Thanks,
Niels