5.4 Advanced Topics
The following sections describe advanced HTTP requests, and how they can be performed using the RWHttpClient and RWHttpAgent classes.
5.4.1 Additional HTTP Methods
The RWHttpRequest object includes the following identifiers for some HTTP methods:
Identifier
HTTP Method
RWHttpRequest::Connect
CONNECT
RWHttpRequest::Delete
DELETE
RWHttpRequest::Get
GET
RWHttpRequest::Head
HEAD
RWHttpRequest::Options
OPTIONS
RWHttpRequest::Post
POST
RWHttpRequest::Put
PUT
RWHttpRequest::Trace
TRACE
The HTTP specification and the RWHttpRequest class enable you to create custom HTTP methods. Example 11 shows a custom RWHttpRequest based on the custom M-POST HTTP method.
Servers and files shown in the code might not exist and are included as examples only.
Example 11 – Creating a custom HTTP method
RWHttpRequestStringBody body("user=roguewave"); // 1
RWHttpHeaderList headerlist; // 2
RWHttpRequest request("M-POST", "/script.cgi", headerlist,
body); // 3
//1 Constructs an RWHttpRequestStringBody containing the information that is passed to the server as the body of the RWHttpRequest object. A Content-Length header is automatically attached to the request indicating the length of the body object.
//2 Constructs an empty RWHttpHeaderList. Additional headers are not added to the request, but the object is passed as an argument to the RWHttpRequest constructor so that the body can also be passed.
//3 Constructs an RWHttpRequest object that executes an M-POST request to the /script.cgi location with the string user=roguewave sent to the server as the body of the request.
When submitted to an HTTP server through an RWHttpClient connected to www.roguewave.com, the request sends the following data:
 
M-POST /script.cgi HTTP/1.1
Host: www.roguewave.com
Content-Length: 14
 
user=roguewave
5.4.2 Specifying a Custom Message-Body Handler
By default, RWHttpClient::getReply() returns an RWHttpReply object. The body of the message is stored as an RWCString internally and is accessible through the RWHttpReply::getBody() member function. You can specify an alternate mechanism for reading the body of the message from the underlying data connection by passing an RWTFunctor handler to the RWHttpClient::getReply() function.
Example 12 creates a function that reads from a portal and writes the associated data to a file.
Example 12 – Creating a function that reads from a portal
void writeToFile(RWPortal portal, RWCString filename) {
RWStreamCoupler coupler(RWStreamCoupler::binary); // 1
ofstream ostrm(filename); // 2
RWPortalIStream istrm(portal); // 3
couple(istrm, ostrm); // 4
}
//1 Constructs an RWStreamCoupler object that redirects the portal stream to the file stream. For more information on RWStreamCoupler, see Section 3.5, “About RWStreamCoupler,”
//2 Constructs a file output stream from filename, where the body of the message is stored.
//3 Constructs an RWPortalIStream from the portal that was passed to the function. For more information on RWPortalIStream, see the Essential Networking User’s Guide.
//4 Couples the portal input and file output streams together so that all data that is read from istrm is written to ostrm.
After writing a function that can read from a portal and write the data to a file, you can create an RWTFunctor handler that uses the function to read the body of an HTTP reply, as shown in Example 13.
Example 13 – Using an RWTFunctor handler
RWHttpClient client = RWHttpSocketClient::make();
// initialize, connect, and submit a request…
 
RWTFunctor<void(RWPortal)> handler; // 1
 
handler = rwBind(writeToFile, rw1, "request.out"); // 2
 
RWHttpReply reply = client.getReply(handler); // 3
//1 Creates an RWTFunctor<> object that is templatized on the first parameter type (RWPortal).
//2 Initializes the object handler with the function to execute and with the second parameter that is not supplied by the RWHttpClient::getReply() invocation.
//3 Invokes the getReply() member function on client. The reply returned from this invocation does not contain a body, which means that RWHttpReply::getBody() returns an empty string. However, the handler stores the message body in the file request.out.
RWHttpAgent does not include a mechanism for specifying a body handler. If the body of a message requires special treatment (for instance, it cannot be stored temporarily in an RWCString), then your application must use RWHttpClient to retrieve the document.
5.4.3 Downloading Part of a Document
When a download is interrupted before it is complete, your application needs to be able to retrieve only the part of the document that was not received. The HTTP/1.1 specification and the Rogue Wave HTTP classes enable you to do this through the HTTP Range header. Example 14 uses a simple GET request to retrieve the first 500 bytes of a document.
Example 14 – Retrieving part of a document with RWHttpClient
RWHttpClient client = RWHttpSocketClient::make();
// initialize, and connect the client…
 
RWHttpRangeHeader rangeHeader(0, 499); // 1
 
RWHttpHeaderList headerlist;
headerlist.addHeader(rangeHeader); // 2
 
RWHttpRequest request(RWHttpRequest::Get, "/", headerlist); // 3
 
client.submit(request);
 
// retrieve reply from the server
//1 Constructs an RWHttpRangeHeader that specifies the byte range to retrieved. In this case, it is the first 500 bytes of the document.
//2 Adds the Range header to the header list.
//3 Includes the Range header in the request object.
RWHttpAgent also enables you to retrieve a portion of a document. Example 15 shows how the previous example is implemented using RWHttpAgent.
Example 15 – Retrieving part of a document with RWHttpAgent
RWHttpAgent agent;
RWTIOUResult<RWHttpReply> replyIOU;
 
replyIOU = agent.executeGetRange("http://www.roguewave.com/",
0, 499);
For more information about partial downloads, see the ResumableDownload sample program distributed with the product.
Sample programs are located in the examples directory created for your installation. For more information, see Installing and Building Your SourcePro C++ Products and Building Your Applications.
5.4.4 Adding and Removing Headers
The RWHttpClient classes automatically add Host and Content-Length headers when required by the HTTP protocol, but you can also add headers to the request or you can replace the defaults. RWHttpRequest takes an optional RWHttpHeaderList parameter that adds headers to the HTTP request, as shown in Example 16.
Example 16 – Adding headers
RWHttpDateHeader date("Sun, 06 Nov 1994 08:49:37 GMT"); // 1
RWHttpHeaderList headerlist;
headerlist.addHeader(date); // 2
 
RWHttpRequest request(RWHttpRequest::Get, "/", headerlist); // 3
//1 Constructs an RWHttpDateHeader from a date string.
//2 Appends the header date to a header list.
//3 Adds headerlist to the RWHttpRequest request. The outgoing HTTP request contains each of the headers in headerlist.
When submitted to an HTTP server through an RWHttpClient connected to www.roguewave.com, the request sends the following data:
 
GET / HTTP/1.1
Date: Sun, 06 Nov 1994 08:49:37 GMT
Host: www.roguewave.com
The HTTP package also includes:
*Helper classes for formatting and parsing specific HTTP headers
*A generic header class, RWHttpGenericHeader, for preparing any label-value pair for inclusion as a header in a HTTP request
For more information about these helper classes, see the SourcePro C++ API Reference Guide.
RWHttpAgent also enables you to specify additional headers. It has two member functions, addCustomHeader() and removeCustomHeader(), that you can use to add headers to outgoing HTTP requests:
Example 17 – Using RWHttpAgent to add headers
RWHttpFromHeader from("user@roguewave.com"); // 1
agent.addCustomHeader(from); // 2
//1 Constructs an HTTP From header with the value user@roguewave.com.
//2 Adds the header from to the agent.
Any requests using the agent in Example 17 would send an additional header line with each request identifying who the request is from.
You can remove headers using the RWHttpAgent::removeCustomHeader() member function.
Once a header is associated with an RWHttpAgent object, all requests using that object send that header. If you add a custom header that should only be sent with a particular request, you need to remove it from the agent before issuing more requests.
5.4.5 Adding a Maximum Wait to Requests
Unexpected network traffic or network problems can cause your application to wait. To plan for these situations, the HTTP package classes include maxwait parameters.
The connect(), submit(), and getReply() member functions in RWHttpClient include an optional maxwait parameter, which is the maximum number of milliseconds that the RWHttpClient and related classes wait for data to become available from the server. Example 18 shows how to use maxwait.
Example 18 – Adding a maximum wait to requests
RWHttpClient client = RWHttpSocketClient::make();
 
client.connect("www.roguewave.com", 80, 2000); // 1
//1 Issues a connection request to the server www.roguewave.com:80 with a maximum wait of 2 seconds (2000 milliseconds).
RWHttpAgent also enables you to specify a timeout period, but it applies to all requests executed by a given RWHttpAgent object.
 
RWHttpAgent agent;
agent.setNetworkMaxwait(2000);
All requests executed through agent have a maximum timeout period of 2000.
Specifying a timeout period does not guarantee that the given member functions will return within the time specified. A maxwait parameter throws an exception only if data is not received from the server within the maxwait period. As long as any data is being received from the server within the maxwait period, exceptions are not thrown.
5.4.6 Sending Requests through HTTP Proxies
When HTTP requests are sent to an HTTP proxy (such as a firewall) that forwards the request to the target HTTP server, your application must include the following steps:
*The client must connect to the proxy HTTP server instead of the actual HTTP server.
*The request must contain a Host header that points to the actual server that the request is intended for.
Example 19 shows a simple GET request executed through a proxy HTTP server.
Servers and files shown in the code might not exist and are included as examples only.
Example 19 – Sending a request through a proxy HTTP server
RWHttpClient client = RWHttpSocketClient::make();
 
RWCString proxyServer = "proxy.somehost.com"; // 1
RWCString actualServer = "www.roguewave.com"; // 2
RWCString path = "/";
 
client.connect(proxyServer); // 3
 
RWHttpHostHeader hostHeader(actualServer); // 4
 
RWHttpHeaderList headerlist;
headerlist.addHeader(hostHeader); // 5
 
RWHttpRequest request(RWHttpRequest::Get, path, headerlist); // 6
 
client.submit(request); // 7
 
// retrieve reply from the server...
//1 Constructs an RWCString that identifies the proxy HTTP server.
//2 Constructs an RWCString that identifies the actual HTTP server.
//3 Attempts to connect to the proxy HTTP server.
//4 Constructs a Host header with the actual HTTP server as its value.
//5 Adds the Host header to an RWHttpHeaderList object.
//6 Constructs an RWHttpRequest object to GET the root document from a server. The RWHttpRequest object contains an additional header that identifies the actual target server that the document should be retrieved from.
//7 Submits the request for the actual HTTP server to the RWHttpClient connected to the proxy HTTP server.
Issuing an HTTP request through a proxy HTTP server using the RWHttpAgent classes uses a slightly different process for specifying the proxy and actual servers to use:
 
RWHttpAgent agent;
 
agent.executeGet(
"http://proxy.somehost.com/http://www.roguewave.com/");
5.4.7 Persistent Connections and Pipelined Requests
The HTTP package classes are implemented with support for both persistent connections and pipelined requests, as described in RFC 2616. Persistent connections allow for multiple request-response pairs to be executed over a single connection. This behavior reduces the cost of latency that occurs during connection.
The HTTP package classes support persistent connections under HTTP/1.1 whenever possible.
Pipelined requests occur when multiple requests are issued over a persistent connection without waiting for a reply from the previous requests. By issuing a series of requests, the time available between sending a request and receiving the reply can be used to issue more requests. If a series of requests is issued in this way, time is saved.
Pipelining includes risk. If a request has a side effect on the server, a pipelined request might generate and prepare a response before a prior requests’ side effect is completed. If the side effect has an impact on what would have been returned if it had completed, the pipelined response to the second request may not be accurate. The following table gives an example of this situation.
Without Pipelining
With Pipelining
Client Request 1: Remove Document A
Client Request 1: Remove Document A
Server Action: Send “Document A Removed Message”
Client Request 2: Get Document A
Server Action: Remove Document A
Server Action: Send “Document A Removed Message”
Client Received: Document A Removed
Server Action: Send Document A
Client Request 2: Get Document A
Server Action: Remove Document A
Server Action: Send “Document A does not exist Message”
Client Received: Document A Removed
Client Received: Document A does not exist
Client Received: Document A
If you determine that a series of actions can be performed safely in a pipeline, you can issue a series of pipelined requests using the RWHttpClient class and pass an additional parameter to the submit request. Example 20 issues two requests to the same server without waiting for a reply.
Servers and files shown in the code might not exist and are included as examples only.
Example 20 – Using pipelined requests
RWHttpClient client = RWHttpSocketClient::make();
 
client.connect("www.roguewave.com");
 
RWHttpRequest root(RWHttpRequest::Get, "/");
RWHttpRequest products(RWHttpRequest::Get, "/products/");
 
client.submit(root, RW_HTTP_ALLOW_PIPELINING);
client.submit(products, RW_HTTP_ALLOW_PIPELINING);
 
RWHttpReply rootReply = client.getReply();
RWHttpReply productsReply = client.getReply();
For more information about the dangers of pipelining, see section 8.1.2.2 and section 9.1.2 of RFC 2616.
RWHttpAgent does not include a mechanism for specifying pipelined requests, but it does use persistent connections internally to reduce latency due to handshaking during connections.
5.4.8 HTTP and International Documents
The HTTP package allows documents in different locales and character sets to be downloaded and processed in a C++ application. When combined with the Internationalization Module of SourcePro Core, they provide a complete solution for working with documents in various character sets. Example 21 illustrates using HTTP with the Internationalization Module.
The following example uses classes from the Internationalization Module of SourcePro Core. For more information on the Internationalization Module, refer to the Internationalization Module User’s Guide and SourcePro C++ API Reference Guide.
Example 21 – Using HTTP with the Internationalization Module
// Create a URL for the target web page.
RWURL url("http://www.amazon.co.jp/");
// Create a string to hold the charset, default to US-ASCII.
RWCString charset = "US-ASCII";
// Connect to the web server and retrieve the page specified.
RWHttpAgent agent;
RWHttpReply reply = agent.executeGet(url);
// Check and see if a Content-Type header is present.
RWHttpHeaderList headers = reply.getHeaders();
size_t index = headers.index("Content-Type");
 
if (index != RW_NPOS)
{
// A Content-Type header is present, extract it.
RWHttpContentTypeHeader ctHeader(headers[index]);
 
// Check and see if a charset is present.
RWCString tmp = ctHeader.getParameterValue("charset");
if (!tmp.isNull())
{
// We found an alternate charset.
charset = tmp;
}
}
// Create converters from the original charset of the message
// to UTF-8.
RWUToUnicodeConverter fromMsgCharset(charset);
RWUFromUnicodeConverter toUtf8("UTF-8");
// Create a RWUString from the body of the message.
RWUString body(reply.getBody(), fromMsgCharset);
 
// Output the body of the message as UTF-8.
cout << body.toBytes(toUtf8) << endl;
return 0;