This example uses the Internationalization and Threads Modules with the C++ Standard Library to merge two sorted lists into a single sorted list. The input files are encoded in UTF-8 and the resulting list is written to the standard output also in UTF-8. In the program, the strings are converted to UTF-16 as required by the classes of the Internationalization Module. The RWUCollator can then sort the lists using the Unicode Collation Algorithm based on any specified locale.
The standard library supplies the classes for program input and output. The Threads Module is used to implement the producer/consumer model through thread functions (RWThreadFunction) and synchronized queues (RWTPCValQueue). This allows concurrent processing of the two input streams, which could significantly improve performance for large data sets.
// Concurrent Merge Example // Hard-coded input file names #define INPUT_FILE_1 "../in/sorted1.dat" #define INPUT_FILE_2 "../in/sorted2.dat" // Include the headers defining how to make thread functions, // critical sections, and synchronized queues #include <rw/thread/rwtMakeThreadFunction.h> #include <rw/sync/RWCriticalSection.h> #include <rw/itc/RWTPCValQueue.h> // Include headers from the Internationalization Module for // Unicode strings, collators, and encoding conversions #include <rw/i18n/RWUString.h> #include <rw/i18n/RWULocale.h> #include <rw/i18n/RWUCollator.h> #include <rw/i18n/RWUToUnicodeConversionContext.h> #include <rw/i18n/RWUFromUnicodeConversionContext.h> // Include headers for iostreams and file streams #include <iostream> #include <fstream> using std::cerr; using std::cout; using std::ifstream; using std::endl; // Create a typedef for a synchronized queue of RWUString objects typedef RWTPCValQueue<RWUString> SQueue; // The Producer function represents the producer in the // producer/consumer model. The producer uses a specified input // stream to read strings encoded in UTF-8. The strings are // converted from UTF-8 to UTF-16 using an // RWUToUnicodeConversionContext. The converted strings are // then written to an RWTPCValQueue, for consumption by a consumer. // Once the end of the file is reached, a well-known string, "555" // is written into the queue to tell the consumer that there is no // more data from this producer. This example creates two Producer // instances, one for each of two sorted input files. void Producer(ifstream& input, SQueue& queue) { // Create the UTF-8 conversion context RWUToUnicodeConversionContext context("UTF-8"); RWUString s; // As long as we have not reached the end of the file... while (!input.eof()) { // Read a string, converting from UTF-8 to UTF-16 input >> s; // Write the string to the queue queue.write(s); } // Write "555" to the queue to mark the end of data queue.write(RWUString("555")); } // The Consumer function represents the consumer in the // producer/consumer model. A conversion context is created to // convert the producer strings from UTF-16 to UTF-8. As strings // are read from the producers, two strings (one from each // producer) are compared using a RWUCollator at Secondary // strength. (Secondary strength considers differences in // basic character identity, and possibly diacritics. Differences // in case are ignored at the secondary level). The lesser of the // two strings is written to the output stream, preserving the // ordering of the strings in the final output file. When the // strings are written to the output stream, the conversion context // is used to convert the strings from UTF-16 to UTF-8. void Consumer(SQueue& q1, SQueue& q2, const RWULocale& locale) { // Instantiate a conversion context for converting from // UTF-16 to UTF-8 RWUFromUnicodeConversionContext context("UTF-8"); // Create an RWUCollator using the default locale, and // then set its strength to secondary, which considers // differences in basic character identity, and possibly // diacritics. Differences in case are ignored at the // secondary level RWUCollator collator(locale); collator.setStrength(RWUCollator::Secondary); // Obtain one string from each of the two producers RWUString str1 = q1.read(); RWUString str2 = q2.read(); // Initialize the done flags, based on the strings // read from the queues bool q1Done = str1 == RWUString("555"); bool q2Done = str2 == RWUString("555"); // As long as there is data in either of the queues... while (!(q1Done && q2Done)) { // If queue 1 is done, or if the string from queue // 1 is greater than or equal to the string from queue 2, // then write the string from queue 2. Update the queue 2 // string and flag if (q1Done || (!q2Done && collator.compareTo(str1, str2) >= 0)) { cout << str2 << endl; str2 = q2.read(); q2Done = str2 == RWUString("555"); } // Else if queue 2 is done, or if the string from queue // 1 is less than the string from queue 2, then write the // string from queue 2. Update the queue 2 string and flag else if (q2Done || (!q1Done && collator.compareTo(str1, str2) < 0)) { cout << str1 << endl; str1 = q1.read(); q1Done = str1 == RWUString("555"); } } } // Main int main(int argc, char* argv[]) { // Create the input files ifstream input1(INPUT_FILE_1); if (!input1) { cerr << "Unable to open " << INPUT_FILE_1 << ", aborting." << endl; exit(-1); } ifstream input2(INPUT_FILE_2); if (!input2) { cerr << "Unable to open " << INPUT_FILE_2 << ", aborting." << endl; exit(-1); } // Create a locale for use in collating the input strings. By // default, use the United States English locale. If a locale // is given on the command line, then use it. RWULocale locale("en_US"); if (argc > 1) { locale = RWULocale(argv[1]); } // Create two synchronized queues, one for each producer SQueue q1(10); SQueue q2(10); // Create the producers RWThread producer1 = rwtMakeThreadFunctionGA2(void, Producer, ifstream&, input1, SQueue&, q1); RWThread producer2 = rwtMakeThreadFunctionGA2(void, Producer, ifstream&, input2, SQueue&, q2); // Create the consumer RWThread consumer = rwtMakeThreadFunctionGA3(void, Consumer, SQueue&, q1, SQueue&, q2, const RWULocale&, locale); // Write a message to show the start of the merge cout << "Starting merge..." << endl << endl; // Start the producers and the consumer producer1.start(); producer2.start(); consumer.start(); // Wait for all threads to terminate producer1.join(); producer2.join(); consumer.join(); // Write a message to mark the end of the merge cout << endl << "Done." << endl; // Return success return 0; } |
The two outputs below demonstrate the difference in sorting based on differing locales. The first uses the default U.S. English locale, the second the Spanish traditional locale, which the example allows you to input at the command line. The two words treated differently are chorizo and llama.
Program output, default en_US Locale | Program output, es__traditional Locale |
Starting merge... agua ahora azul azur blanco blanco cabeza caliente chorizo curioso despues donde familia hombre limpio llama luna luz madre mano nombre oreja padre rojo rosa Done. |
Starting merge... agua ahora azul azur blanco blanco cabeza caliente curioso chorizo despues donde familia hombre limpio luna luz llama madre mano nombre oreja padre rojo rosa Done. |
Copyright © Rogue Wave Software, Inc. All Rights Reserved.
The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.