Objective Toolkit : Chapter 17 Data Extraction Classes
Chapter 17 Data Extraction Classes
Overview
 
Building the Libraries
Developers can use Objective Toolkit’s data extraction classes to extract data from text streams. These classes depend on an underlying third-party regular expression library, Regex++1. The Regex++ v.2.2.4 library is distributed and installed with Objective Toolkit. The URL for the Regex++ Web site (as of the publication date of this manual) is http://www.boost.org/doc/libs/1_40_0/libs/regex/doc/html/index.html
NOTE >> Read the regex.htm document (located at <stingray-installidr>\Regex) that is installed as part of the Regex++ library.
 
NOTE >> Regex++ is distributed under the following copyright, which is reproduced here as required.

Regex++ Copyright © 1998-9 Dr. John Maddock Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Dr. John Maddock makes no representations about the suitability of this software for any purpose. It is provided “as is” without express or implied warranty.
To build the Objective Toolkit data extraction classes:
1. Run the Objective Toolkit Build Wizard.
2. Select HTML data extractor under the Utility Classes section in Objective Toolkit.
Figure 140 – Select HTML data extractor when running the Objective Toolkit Build Wizard.
3. Build the Objective Toolkit libraries with the new make files generated.
Regex++ is used to facilitate the new HTML export functionality. The HTML data extractor component requires the Regex library, which makes powerful regular expression matching functionality available to projects incorporating Objective Toolkit.
Objective Toolkit has an optional dependency on the Regex++ library, which in turn has other dependencies. As a result, additional files (other than the Objective Toolkit DLLs themselves) may need to be distributed with your application.
NOTE >> The Toolkit build wizard now contains a checkbox to enable or disable the requirement for Regex. This checkbox is unchecked by default. Regex library solution files have also been updated to output the Regex libraries to the appropriate MFC product library directory by platform. For example, Win32 VC++ <ver> Regex libraries are now output to <stingray-installdir>\Lib\<ver>\x86. This output pattern is consistent for all supported compilers as well as x64 editions of the libraries. This means that the IDE path for Regex is no longer needed. Please refer to “Stingray Studio Paths in Property Sheets” in the Getting Started part for more detailed information about these IDE paths. Also, you can find information about redistributables in Appendix 6 of regex.htm file located at <stingray-installdir>\Regex. As described in regex.htm, it is also possible to link to a static version of Regex++ and avoid installation of additional libraries.
Using Regular Expression Libraries
PERL and other such languages are commonly used to solve the frequently encountered programming problem of extracting and manipulating text stream data. We can also use tools such as lex and yacc to create custom scanners.
C++ has several regular expression libraries that can provide similar functionality. It is possible to extract information from text streams quite easily using regular expression libraries. Consider the following text:
<HTML>City of Raleigh, NC, temp at 7 P.M. <b>52 degrees</b> </HTML>
Using a regular expression like the one shown below, it is easy to extract—as data elements—the city name, state, and the temperature at a certain time.
<HTML>City of ([a-z]*), ([a-z]{1,2}), temp at ([0-9]{1,2}) (P.M|A.M). <b>([0-9]{1,3}) degrees</b> </HTML>