Thursday, December 20, 2007

Web Services and Files Exchange

SOAP based Web Services are now very common in the enterprise architecture, and quite often, applications that consume or publish services would need to send binary content such as images, PDF or Word documents (or anything you have in mind...). The SOAP and XML provide different way to achieve this. So what are the challenges around binary data exchange using SOAP based Web Services:
  • The main goal of Web Services is interoperability; so when you are offering a service, you need to be careful about the technical choice you are making. SOAP has been one great success in term of interoperability. I am aware that REST is also a very good fit for that but since I talk about SOAP and later WS-* standards, I do not want to talk about REST more in this post, the only thing that you can put in your mind is before choosing to implement SOAP based Web Services, ask the following question to yourself: "do I really need SOAP services or REST would be enough?"... That said let's continue on SOAP and binary content exchange. When talking about binary content, the interoperability comes with some trade off around for example performance/message sized or impact on developer. This will be discussed later, but always keep in mind that interoperability is the key point of Web Services. If this is not the case on your project, that means you probably do not need to use SOAP that has an important overhead in general.
  • Performance and Scalability is also quite important when you are building a service based application. Especially that often you cannot predict exactly how much a service will be used. We have to keep in mind that often services are build to be reusable, it is one of the basic best practices of development, so if the service is really "reused" it is important to keep it running with acceptable performances. This is why when talking about binary content, with SOAP it is important to talk about the impact of it on the size of the message and the processing cost.
  • When using SOAP Composability is also quite important. In the context of binary content exchange with XML/SOAP it is important to support composability of the WS-* standard, and this in a performant manner. An example would be that a services that is sing WS-Security to sign part of the messages should be able to sign the PDF document using the same standard.
  • Impact on development: it is interesting also when choosing the way binary content should be exchanged with SOAP, to see how much impact it has on the development itself. Does a developer must import specific API to be sure that the binary content is properly sent/consumed by the server or client. Note: I will talk about Java here, and particularly about JAX-WS/JAX-RPC since it is the stacks that I know the much, but the remarks would be the same on all technologies.
Let's now dive into the different options that are offered to a developer/architect to exchange document using SOAP:
  • XML Base64 encoding
  • SOAP With Attachment (SwA) using MIME (Multipurpose Internet Mail Extensions)
  • SOAP With Attachment using DIME (Direct Internet Message Encapsulation)
  • Message Transmission Optimization Mechanism (MTOM)
First of all, I will not talk in detail about the 3rd point around SOAP with Attachment with DIME, for a simple reason: this approach has been pushed by Microsoft around 2003/2004 and it is now deprecated in favor of MTOM.

Base64 Encoding
Base64 is part of the core XML capabilities, and when using it to exchange binary content in a SOAP message it has some very good advantages:
  • Since it is part of XML itself, it has a great interoperability, I can say that all stacks will be able to consume or send messaged that contains Base64 data.
  • For the same reason it does not have any impact on development, most of the Java stacks will automatically use base64 encoding when byte[] paramters will be present.
  • Always because of the fact that is it 100% XML based, the composability with other XML/WS-* standard is very good.
  • So far eveything looks great for this approach, but the trade off is the following: Base64 encoding is not efficient, since "lot of" CPU will be used to encode and decode the binary content. In addition the size of the encoded data would be around 30% bigger than the binary content itself. (It can still be used for small dataset)
SOAP with Attachment (SwA)
The SOAP with Attachment specification is the first effort of the Web Services industry to solve the problem of binary content with SOAP. The idea is to In addition to the W3C Note, the WS-Interoperability organization, has extend this recommendation to create a basic attachment profile to enforce the interoperability of it, using the SOAP with Attachment Reference (swaRef).
  • The good part of SwA and is the fact that it has been noted by the W3C but also adopted by the WS-I organization. But in fact the interoperability is not that great, mainly because none of the Microsoft Web Services solution support SwA. It is true that most of the Java stacks, starting with the standard JAX-RPC/JAX-WS is supporting SwA and swaRef but it is not enough to call it a good interoperability.
  • The reason why Microsoft refused to implement it, and why it is only a W3C note (and not a recommendation) it is because SOAP with Attachment has poor composability. The reason why it is hard to use WS-* standard with SwA, it is because it breaks some part of the model by ignoring the SOAP/XML processing and just put the document in the MIME header, and a simple reference to it into the SOAP message.
  • SOAP with Attachment is efficient, because of the previous point. The SOAP stack does not really deal with the content and just stream it into the MIME header.
  • When it is used with JAX-RPC and JAX-WS, has an impact on the developer, that must use specific Java API to build it service and put specific data types in the WSDL. The impact on development is not large, but still developper has to think about providing the good method signature or WSDL entry to enforce the use of SwA/swaRef in its service. Where I do believe most developers would expect this to be transparent.
Message Transmission Optimization Mechanism (MTOM)
The last mechanism is also based on MIME on the wire to exchange the binary content, but the way the message (SOAP+MIME) is build is totally different from the previous SwA approach. MTOM has been based on the "experience" of the others mechanisms, to be able to support composability without impacting the performance and the development.
  • Interoperability is virtually great. It is great because it has been pushed by major vendors such as IBM, Microsoft, BEA, Oracle and it is a W3C recommendation, so interoperability should be good. I put a "virtually", because to be interoperable the various Web Services stack must implement it, and it is not the case yet. Today, most of the latest stacks are supporting MTOM so it should not be an issue if you are starting a project.
  • Composability is perfect, since MTOM does use the SOAP envelop but it provides an automatic and transparent optimization to put the binary content on the MIME header. During the serialization of the message, the SOAP engine is working with the content with a temporary base64 representation of the content allowing all the WS-* operation needed, for example an XML signature, but without the overhead of dealing with base64 over the wire.
  • MTOM appears like the most efficient way of dealing with large document and SOAP.
  • Because MTOM is using the same approach than the pure XML base64 process, it does not have any impact on development. In fact this the the Web Service stack that choose to use base64 (embedding the document) or MTOM over the wire. And this could be done in conjunction with a WS-Policy. As you can see in the WS-MTOMPolicy this is not under the control of the developer but more under the control of the administrator and then the applications to choose or not to use MTOM.
But... Which one I should use?
Based on the different points described earlier is looks like MTOM is the way to go; even if this is true it cannot be summarized to this. First of all MTOM is not supported by all the stacks, so if you cannot control the consumers of your services and cannot impose a modern stack, MTOM may not be the best approach. For me, the second on the list is the Base64 approach, because of high interoperability but it is important to remember that has an impact on performance/processing. I personnally would not push SwA because of its non support in the Microsoft world... As you know the world is not yet 100% Java based ;). Let's take a look on which stacks are supporting MTOM today:
  • JAX-WS reference implementation (and Metro)
  • IBM Websphere 6.x with SOA Feature Pack
  • BEA Weblogic 10
  • OracleAS 10gR3 (10.1.3.1) JAX-RPC and FWM 11 preview (JAX-RPC and JAX-WS)
  • Axis2
  • XFire
  • JBossWS
You can find more information on these comparison matrices : Apache WS Stack Comparison and Xfire Comparison Matrix. (these two are probably very interesting to keep... unfortunately they do not contains any MSFT data. I had one in the past, but cannot find it... if you have such matrix feel free to post it in comment.)

Monday, December 10, 2007

Working on a large XML or SOA project: think about "separation of concerns"

With XML and SOA becoming mainstream in the enterprise XML operation such as Schema validations, XSL transformations are now very common. These specific operations are CPU intensive and could become a performance bottleneck when directly applied on the middleware. It could be even worst now when using SOAP based Web Services and their related WS-* standards. For example with WS-Security, XML encryption and signature is now more and more used in SOA based applications.

This is why many enterprise architects are bow looking for solutions to improve performances of XML centric applications.

One of the think we learn when developing application, and that Aspect Oriented Programming has highlighted is the concept of “separation of concerns”. It is key to keep that in mind also in global architecture in our case by separating the XML processing from the business logic. Hopefully it is most of the time done directly by the various Web Services framework you are using, you do not code the SOAP request/response, it is hidden by the Web Services framework.

However, in the current application server, the full XML treatment is made directly in the container, for example the XML Encryption is made in the same container that the place where the pure business logic is executed. So let’s find a solution to extract the most intensive XML processing into another part of the system.

Vendors have now in their catalog appliances that could do the job. The same way that today we are using SSL accelerators to deal with SSL encryption/decryption, we can put XML appliance to deal with the intensive CPU processing operation: XML validations, transformation, Ws-Security enforcing point,...

Architecture Overview
The overall architecture could be represented using the following schema :

The role of the XML/SOA Appliance varies a lot depending of your project:

  • Simple XML firewall to check the validity of the XML/SOAP messages
  • Web Services access control: lookup enterprise directory to check authentication and authorization. This could be based on the WS-Security standards and its various tokens (username, SAML, ...)
  • Content generation and transformation: the appliance can be used to serve various devices for example WAP cell phone or simple HTML Web Client. the XSL transformation is done in a very efficient way in the appliance directly.
  • Services Virtualization : it is possible to route the different messages to various end point depending of simple rules. (business or IT system rules)

As you can see from an architecture point of view, XML appliances are very interesting to distribute the heavy processing of XML to some specific hardware. I have noticed that sometimes developers/architects hesitate to put another piece of hardware/software in their design, but I do think that in this specific case it is probably a good move.

Separating the concern is quite easy and very clean when dealing with XML processing, but also it will allow the overall architecture to be managed in a better way. This kind of appliance will allow administrators to centralize the management of policies, and transformations. But also a side effect of this is the simple fact that when dealing with Web Services, you can easily add WS-* support to many stacks that do not support "them".

XML/SOA Appliances Offering
I have said earlier that vendors are offering such products, here some of the product that I have met or pushed:

What's next?
Some of you would probably raise the fact that the application server, especially when dealing with Web Services, must parse the XML/SOAP request even if this has been done by the appliance. Yes it is true, but I am sure that in a next future the vendors of such solution would optimize it by providing for example support for binary XML, or any other solution that will improve even more the performance of the overall IT in complex enterprise architecture. But for this application server must support binary XML first to avoid proprietary approaches.

Another point of view that I have not talk about is the possible support of such appliance around Web 2.0/Ajax optimization. I have not yet dive into this, but I am sure we can do very interesting things too.

Finally if you have experiences with any XML/SOA appliance feel free to post a comment about it, it will help the readers to see the interest (or not) around this topic.

Monday, December 3, 2007

Portal Project: Time to Think about your Social Networking Enterprise Strategy

Most of the enterprises these days have already put in place a portal -- with more or less success. These projects have started most of the time, with the goal of providing personalized information to users and communities. When working in a Portal project you probably define many objectives, that are represented from a technical point of view by the following features:

  • community and group of users
  • easy content management, allowing people to communicate and share
  • data integration of many types of data related to the information needed by each user/community.
I do not see anything special there except that is exactly the same goals that what most of the Web2.0/Social Networking applications do have:
  • User management and creation of the community is something that you do on a daily basis with FaceBook and LinkedIn (and any equivalent sites). The key here is the fact that it is the user that define its own community, not an administrator that does not know “my” business that put me in a specific bucket.
  • Content Management: Blogs are very good example of communication of a single person (or team) to the rest of the enterprise, (or the rest of the world). Wikis are tools helping you to share content with other people in a very efficient way. If you are working with OpenSource you see that most of them are using a wiki to communicate with user. This is true for the documentation, but also any type of content that is related to a project.
  • Data Integration: Web 2.0 is all about RSS feeds and Mashups that are, at least today one of the most efficient way, from a user perspective, to integrate content.
This is why I do believe that if today you are thinking about an enterprise portal for your organization, it is probably time to step down a little and think more about:
  • the “enterprise social networking strategy”, that is often also related to the “Web 2.0 enterprise strategy”.
So some ideas to approach this:
  • create a community with some internet tools, for example start by creating a network in Facebook for your enterprise. Some if you will probably think that it is not productive for the enterprise... Hmm I have to say that it is not directly, but at least it helps people to be familiar with a new way of using the computer and the Internet. Companies I am working in or with do have their network already (Sogeti, CapGemini, IBM, Oracle, ...). An example of this is the way Serena is using FaceBook as part of their intranet and as a tools to do better business. (and some people reactions to this: FaceBook Friday:Bad Idea)
  • if your challenges are around content management start by installing a Wiki in house or using an internet one. I am sure you will be surprised to see the adoption and use in your team. I have many experiences where a regular "Portal/CMS" failed regarding the "community sharing" where Wikis have been a great success.
  • if your challenges are around data integration, I will encourage you to learn more about Rest/RSS and other technologies that are used in mashups. It is true that this one is probably will need more effort from IT to provide the good content feed, but instead of giving the data already packaged in an HTML view (portlet?) do send only the XML using a proper format (RSS/ATOM) and give correct tools to the user, to see how then will be consuming it.
I see this approach more business oriented, this is empowering the business user giving them an infrastructure to select their own tools. Portal the Darwin way kind of approach. So if you have an existing portal, or more important if you are thinking about starting a Portal project, add the “Web 2.0/Social Networking strategy” question to your plan. And to be honest, asking yourself this question about Web20/Social Network does not cost that much but could probably help yourself to satisfy your end users and customers... Note: It is voluntarily that I am mixing up the Web 2.0 (technologies) and the social networking (behavior), since these two are intimately linked. Web 2.0 being the set of tools and technologies facilitating the social networking.