Home Articles Books Downloads FAQs Tips

Using Open Source Libraries with C++Builder 6.0

By Harold Howe, Big Creek Software, LLC

Creator of the bcbdev.com website
Member of TeamB
Co-author of C++Builder How-To
Contributor to C++Builder 4 Unleashed by Charlie Calvert and Kent Reisdorph
hhowe@bcbdev.com


Table of Contents




1: Introduction


The popularity of open source libraries has soared over the past decade. With commercial software companies demanding more and more money for proprietary systems that seem to do less and less as time goes on, many developers are turning towards open source libraries as a solution to their problems. Open source libraries tend to be free, and by definition, come complete with source code.

If you are reading this article, chances are that you are interested in utilizing open source tools in your development. Maybe it is the openness and the freedom that attracted you. Or perhaps you are simply looking for libraries that endeavor to attain a higher degree of quality. Or maybe you are sick of paying for new versions of proprietary software where the only new feature seems to be an armada of new bugs. Regardless of the reason, you are here to read about how to leverage open source technologies in C++Builder.

This article focuses on 6 open source projects : Boost, Xerces, Flex, Bison, ACE and TAO, and PCRE. All 6 of these tools are open source and free for commercial and non-commercial use. Furthermore, all of them work well with C++Builder.

The focus of this article is not to cover these 6 tools in depth. That would take far too much time. Instead, this article demonstrates how to install and configure the tools for use with C++Builder 6. The article also contains a variety of example projects, but the examples are elementary in nature (hello world type of examples). The article focuses on installation and configuration for a simple reason: many of these tools lack instructions for using them with C++Builder. This article attempts to fill this gap while deferring complex topics to other books and websites.


2: Boost


2.1 Introduction

Boost (boost.org) is a collection of C++ libraries. The libraries come in source form and work with a variety of compilers and platforms. The current version of Boost is 1.27.0, although recent additions to Boost source code can be downloaded from the boost Source Forge repository at http://sourceforge.net/projects/boost/.

Boost offers a wide variety of utilities. They include:

array        : a container for fixed size arrays
graph        : aka 'BGL'. a set of graph algorithms
python       : maps C++ classes into python
regex        : a regular expression library
smart_ptr    : a collection of smart pointer classes
static_assert: compile time assertions
thread       : portable C++ threading library

Note that this is not a complete list. Visit http://www.boost.org/libs/libraries.htm for a complete list of libraries

2.2 Installing Boost

Many of the Boost libraries are C++ template classes that reside entirely in header files. You can use these libraries by simply extracting the Boost files and configuring the include path in the BCB environment. Some of the Boost libraries need to be compiled into library form. They include the regex, thread and graph libraries.

Installation steps
  1. Download the entire boost archive from http://boost.sourceforge.net/release/.
  2. Extract the archive to a suitable location (e:\code\lib\boost). By default, the archive will create a subdirectory in that folder called boost_1_27_0.
  3. Create a global environment variable called BOOST_ROOT. Set it to the Boost installation directory (e:\code\lib\boost\boost_1_27_0).
  4. Add $(BOOST_ROOT) to the include path of your default BCB6 project.
  5. Download the most recent version of the configuration header borland.hpp from the Boost Source Forge repository. This version has been updated for BCB6. Download this file and overwrite $(BOOST_ROOT)\boost\config\compiler\borland.hpp. The most recent version of borland.hpp can be found at http://sourceforge.net/projects/boost/.
  6. Most of the Boost libraries can be used without any additional work. Some of them do require a compilation step. These include the regex, thread, and python libraries. Unfortunately, non of these libraries work with BCB6 because of compiler bugs. If you are interested in compiling these libraries, check out the build instructions at http://www.boost.org/tools/build/index.html. libraries now.

Boost includes a test suite for measuring how well a compiler supports Boost. The results for BCB6 and BCB5 can be found at http://www.bcbdev.com/articles/borlandboost.htm. Because of some problems in the compiler, the following Boost libraries do not work with BCB6 (at the time of this writing, but keep an eye out for BCB6 patches that might fix some of these problems)

function  : function.hpp fails to compile because BCB6 does not allow
            default template arguments on a nested class. This is a
            compiler bug.
graph     : graph library does not compile with BCB6
thread    : boost::thread relies on the boost function library
python    : boost::python relies on the boost function library
regex     : the regex library compiles, but access violations occur at
            runtime. This is presumably the fault of compiler bugs.

2.3 Boost Examples

2.3.1 boost::array

Boost array is a container class that provides an STL like container interface for a statically sized array. It provides begin and end methods for iterating the array. The array container also provides a swap method for efficiently swapping the contents of two arrays, and it proves a subscript operator for accessing elements in the array. For a complete list of member functions, consult the header file boost/array.hpp or visit the online documentation for array at boost.org (http://www.boost.org/libs/array/array.html).

The Boost array class fills a void between the STL container classes and ordinary arrays. STL containers grow dynamically. You can add and remove elements at will. As you do, the size of container grows or shrinks with each call. Dynamic containers are more flexible than ordinary arrays, however this flexibility incurs some overhead at runtime. You can avoid this overhead by switching back to static arrays. But arrays don't have the nice interface that the STL containers. The Boost array container bridges this gap. It combines the speed and simplicity of a static array with the convenience of an STL container class.

Listing 2.1 shows an example of how to use the Boost array container.

//-----------------------------------------------------------------------------
//Listing 2.1: array.cpp
#include <iostream>
using namespace std;

#include "boost/array.hpp"

struct Foo
{
    int x;
    int y;
    char c;
};

int main()
{
    boost::array<Foo, 10> a = {{ {0, 0, 'a'},
                                 {1, 1, 'b'},
                                 {2, 4, 'c'},
                                 {3, 9, 'd'} }};
    // reset a[1].y
    a[1].y = 10;

    cout << "Boost array contains: " << endl;
    for (boost::array<Foo, 10>::const_iterator iter = a.begin();
         iter != a.end(); ++iter)
    {
        cout << iter->x << ',' << iter->y << ',' << iter->c << endl;
    }

    return 0;
}
//-----------------------------------------------------------------------------

The array class contains two template parameters. The first parameter specifies the type of objects that you want to hold in the array. The second parameter determines the size of the array. The code in Listing 2.1 creates an array that holds ten Foo structures.

There are a couple of points worth mentioning regarding the boost array class. First, an array object is always full, just like an ordinary array. There are no methods for adding or removing elements. The size member function always returns the size that you passed as a template argument. Secondly, notice that we use array style initialization when we construct the array. This syntax allows us to directly initialize the values in the array. In order to support this syntax, the array class does not provide any constructors. If you open boost\array.hpp, you will see that indeed the class does not have any constructors.

2.3.2 boost::lexical_cast

lexical_cast is a template function that converts values into text form. The syntax for lexical_cast resembles static_cast and dynamic_cast. Listing 2.1 demonstrates how to use lexical_cast.

//-----------------------------------------------------------------------------
//Listing 2.1: lexical_cast.cpp
#include <iostream>
#include <string>
using namespace std;

#include "boost/lexical_cast.hpp"

int main()
{
    float  f = 3.14159;
    string s;

    s = boost::lexical_cast<string>(f);

    cout << "Original float      : " << f << endl;
    cout << "Converted to string : " << s << endl;
    return 0;
}
//-----------------------------------------------------------------------------

lexical_cast relies on string streams to perform the conversion. It inserts the source value into a temporary stringstream object, and then performs an extraction into the result. In order to use lexical_cast, you must provide operator << for the source type and operator >> for the destination type. The code for lexical_cast looks sort of like this:

// A simplified version of lexical_cast without error checking
template<typename Target, typename Source>
Target lexical_cast(Source arg)
{
    std::stringstream interpreter;
    Target result;

    interpreter << arg;
    interpreter >> result;

    return result;
}

2.3.3 boost::smart_ptr

The smart_ptr library consists of five smart pointer classes: scoped_ptr, scoped_array, shared_ptr, shared_array and weak_ptr. The list below summarizes the purpose of each class.

scoped_ptr   : like auto_ptr, but never transfers ownership. scoped_ptr
               objects should not be stored in containers.
scoped_array : array version of scoped_ptr. Calls delete [].
shared_ptr   : reference counted smart pointer. Safe for use in containers.
shared_array : array version of shared_ptr.
weak_ptr     : stores a pointer that is already owned by a shared_ptr.

Each Boost smart pointer provides functionality that can't be found in the standard auto_ptr class. For example, scoped_ptr does not allow you to copy one object to another. This in turn prevents you from transferring ownership of the underlying pointer. scoped_ptr is more restrictive than auto_ptr in this sense. auto_ptr allows copying, but when you copy, ownership of pointer transfers to the target. This can be a suble source of problems. scoped_ptr allows you to explicitly state that you don't want your code to ever transfer ownership of the pointer.

shared_ptr allows you to copy pointer objects, but it does not transfer ownership. Unlike auto_ptr, the boost shared_ptr class provides true copy semantics. It accomplishes this by maintaining a reference count. shared_ptr pointer is probably the most common of the Boost smart pointers. Listing 2.3 demonstrates how to use shared_ptr.

Tip Note:

If you allocate a block of memory with the array form of new (ie new T[n]), C++ requires that you free that memory by calling delete []. When designing a smart pointer class, it is impossible to know which version of new created the raw pointer. In order to support the deletion of dynamically allocated arrays, you must employ a smart pointer lass that invokes delete [] instead of just delete. scoped_array and shared_array do just that. They are the array compatible versions of scoped_ptr and shared_ptr.


//-----------------------------------------------------------------------------
//Listing 2.3: shared_ptr.cpp
#include <iostream>
using namespace std;

#include "boost/smart_ptr.hpp"

class Foo
{
public:
    Foo()  { cout << " - constructed Foo: " << this << endl; }
    ~Foo() { cout << " - destroyed Foo  : " << this << endl; }
    void DoSomething()
    {
        cout << " - " << __FUNC__ << " : " << this << endl;
    }
};

void Test()
{
    cout << "beginning of Test scope" << endl;
    boost::shared_ptr <Foo> foo1 (new Foo);
    foo1->DoSomething();
    {
        cout << "beginning of inner scope" << endl;
        boost::shared_ptr <Foo> foo2(new Foo);
        foo2->DoSomething();

        cout << "Assigning foo2 to foo1 " << endl;
        foo1 = foo2;
        foo2->DoSomething(); // prove that foo2 still refers to something
        cout << "end of inner scope." << endl;
    }
    foo1->DoSomething();
    cout << "end of Test scope." << endl;
}


int main()
{
    cout << std::hex << std::uppercase;
    cout << "calling Test" << endl;
    Test();
    cout << "Test returned" << endl;
/*
        // can't do this
        boost::shared_ptr <IFoo> foo1 (new Foo);
        boost::shared_ptr <IFoo> foo2 (foo1.get());
*/
    return 0;
}
//-----------------------------------------------------------------------------
Output of Listing 2.3
calling Test
beginning of Test scope
 - constructed Foo: C76320
 - Foo::DoSomething : C76320
beginning of inner scope
 - constructed Foo: C7634C
 - Foo::DoSomething : C7634C
Assigning foo2 to foo1
 - destroyed Foo  : C76320
 - Foo::DoSomething : C7634C
end of inner scope.
 - Foo::DoSomething : C7634C
end of Test scope.
 - destroyed Foo  : C7634C
Test returned

The most interesting part of this example is the assignment of foo2 to foo1. At this point in the code, the reference count for the Foo object at 0xC76320 drops to zero and the object is destroyed. The reference count for the object at 0xC7634C increments to two. After the assignment, foo1 and foo2 maintain a pointer to the same object.

When the inner scope completes, the foo2 smart pointer is destroyed. However, this does not delete the underlying pointer. In merely decrements the reference count. At this point, the reference count drops from 2 to 1. foo1 is left holding the last reference to the object at 0xC7634C. When the test function finally returns, foo1 is destroyed, the reference count drops to zero, and the last Foo object is finally deleted.

Tip Note:

The constructor for the boost smart pointers are explicit (except for weak_ptr). Using explicit constructors prevents the compiler from implicitly converting raw pointers into smart pointers.


2.3.4 boost::static_assert

A static assertion is an error check that occurs at compile time. If the assertion fails, the compiler generates an error. Although static assertions have been around for some time, the book Modern C++ Design (Alexandrescu[2001]) has brought them into the spotlight.

The Boost static assertions library provides an easy way to perform compile time assertions. Listing 2.4 demonstrates how.

//-----------------------------------------------------------------------------
//Listing 2.4: static_assert.cpp
#include <iostream>
using namespace std;

#include "boost/static_assert.hpp"

// Test is a simple structure. The static assertion will fail if the
// pragma's are not part of the code.
//#pragma pack(1)
struct Test
{
    char c;
    short s;
    int   i;
};
//#pragma pack()

int main()
{
    // ensure that the structure was byte aligned
    BOOST_STATIC_ASSERT(sizeof(Test) == 7);

    // our code is not ready for ints that are bigger than 4.
    // force a compiler error now if larger int size is detected
    BOOST_STATIC_ASSERT(sizeof(int)  <= 4);

    cout << "Hello world" << endl;
    return 0;
}
//-----------------------------------------------------------------------------
Compiler output for Listing 2.4
[C++ Error] static_assert.cpp(20): E2450 Undefined structure
            'boost::STATIC_ASSERTION_FAILURE<0>'
[C++ Error] static_assert.cpp(20): E2109 Not an allowed type

The compiler generates an error for Listing 2.4 because the Test structure does have the correct size. The only way to fix the problem is to compile the code with the -a1 switch, or surround the structure definition with pragma directives that temporarily adjust the structure packing.

Tip Note:

Because static_assert is a compile time assertion, the expression that you pass it must produce a result that is known at compile time. The sizeof operator satisfies this requirement.


2.3.5 boost::tuple

A tuple is essentially a structure with anonymous or un-named members. It is a fixed size sequence of elements, usually of different types (a tuple of like elements is more or less an array). Some languages, such as Python, provide native support for tuples. C++ does not.

Tuples are frequently used to create functions that return more than one value. For example, the Execute method of an ADO connection object in Python returns a tuple (assuming you are running Python on Windows). The second element of the tuple is Boolean flag that indicates whether the query succeeded. The first element contains the result set in the form of an ADO RecordSet object.

# A python code fragment that demonstrates the use of a tuple
result = ADOConnection.Execute("select * from orders)
if result[1] != 0:
    while not result[0].EOF:
        result[0].MoveNext()

The result variable is a tuple. In Python, tuples are dereferenced by index. result[1] returns the Boolean flag and result[0] gives us access to the ADO result set. Python also allows you to bind members of a tuple to variables. For example:

# A python code fragment that demonstrates the use of a tuple
(rs, success) = ADOConnection.Execute("select * from orders)
if success != 0:
    while not rs.EOF:
        rs.MoveNext()

The parenthesis form a tuple from the variables rs and success. Python allows this tuple to act as the target of a tuple assignment.

C++ does not provide native support for tuples. However, Boost provides a tuple library whose syntax is very easy to use. Boost tuples resemble tuples in python. Listing 2.5 demonstrates how to utilize the Boost tuple library.

//-----------------------------------------------------------------------------
//Listing 2.5: tuple.cpp
#include <iostream>
#include <string>
using namespace std;

#include "boost/tuple/tuple.hpp"
#include "boost/lexical_cast.hpp"

boost::tuple<bool,string> GetSomeString(int index)
{
    if(index == 2)
        return boost::make_tuple(false, "");
    else
        return boost::make_tuple(true, "value: " +
                                 boost::lexical_cast<string>(index));
}

int main()
{
    boost::tuple<bool, string> t = GetSomeString(5);
    cout << "GetSomeString(5) returned tuple: " << endl
         << "   - t.get<0>() = " << t.get<0>() << endl
         << "   - t.get<1>() = " << t.get<1>() << endl << endl;

    t = GetSomeString(2);
    cout << "GetSomeString(2) returned tuple: " << endl
         << "   - t.get<0>() = " << t.get<0>() << endl
         << "   - t.get<1>() = " << t.get<1>() << endl << endl;

    bool success;
    string value;
    boost::tie(success,value) = GetSomeString(5);
    cout << "GetSomeString(5) returned tuple: " << endl
         << "   - success = " << success << endl
         << "   - value   = " << value   << endl << endl;

    boost::tie(success,value) = GetSomeString(2);
    cout << "GetSomeString(2) returned tuple: " << endl
         << "   - success = " << success << endl
         << "   - value   = " << value   << endl << endl;

    return 0;
}
//-----------------------------------------------------------------------------

The function GetSomeString returns a tuple that contains an integer and a string. By returning a tuple, the function essentially returns two results instead of one.

Listing 2.5 demonstrates two ways of working with tuples. First, you can declare a tuple object and work with it directly. To access individual elements of the tuple, call the get function. Pass the index of the element as a template argument to get. The syntax is:

using namespace boost;
tuple<int, string, float> t(1, "hello", 3.14);
get<0>(t) = 42;
get<1>(t) = "world";
float f = t.get<2>();

The second strategy for using tuples is to tie the tuple to existing variables. Listing 2.5 calls the tie utility function to bind the variables success and value. The code fragment below highlights the tie function.

using namespace boost;
tuple<bool,string> GetSomeString(int index); // same function as before

bool   b;
string s;
tie(b,s) = GetSomeString(2);
if(b)
    cout << s;

Tip Note:

The tuple examples shown here store two items in the tuple object. You can add more if you like. Boost tuples can hold up to 10 items.



3: Xerces


3.1 Introduction

The Apache Software Foundation (www.apache.org) provides a variety of open source libraries for both C++ and Java developers. One of those libraries is an XML parsing library called Xerces. Xerces is a validating XML parser. It implements the DOM 1.0, DOM 2.0, SAX 1.0, and SAX 2.0 specifications. The current version of Xerces is 1.7.0.

3.2 Installing Xerces

Of the libraries discussed in this article, Xerces is probably the easiest library to install. Xerces includes project files for BCB6. To build and install Xerces, follow these steps.

  1. Download the latest stable release of Xerces from http://xml.apache.org/dist/xerces-c/stable/. The current version at the time of this writing is 1.7.0.
  2. Extract the Xerces archive to a suitable location (e:\code\lib\xerces). The Xerces archive extracts to a subdirectory called xerces-c-src1_7_0. To make upgrades easier, you should keep this directory structure.
  3. Open the Xerces C++Builder project group. The project file is in the xerces-c-src1_7_0\Projects\Win32\BCB6\Xerces-all subdirectory. Build Xerces by selecting Projects-Make All Projects (see note below if this step appears to do nothing).
  4. The build step will create xerceslib.dll and xerceslib.lib in the xerces-c-src1_7_0\Build\Win32\BCB6\ subdirectory. These files need to be moved to a suitable location on your system. The LIB file needs to be placed in a directory where the linker can find it. The DLL needs to reside somewhere on the system path. Copy the LIB file to either $(BCB)\lib or $(BCB)\projects\lib. Copy the DLL to $(BCB)\projects\bpl, $(BCB)\bin, or your system32 directory.
  5. Start C++Builder 6.0. If BCB creates a new application on startup, perform a File-Close All operation. Create an environment variable for Xerces by following these steps:
    • Open Tools-Environment Options
    • Click New
    • Enter XERCES for the variable name.
    • Enter the Xerces root path for the value (ie e:\code\lib\xerces\xerces-c-src1_7_0)
  6. Perform the following two steps for any project that needs to use Xerces:
    • Add xerceslib.lib to the project (this is an import library for xerceslib.dll)
    • Add the paths $(XERCES)\src\xercesc and $(XERCES)\src to the include path for the project.

Tip Note:

When I try build the Xerces project group using Project-Build All Projects, BCB6 compiles one CPP file and then stops. It won't build the entire project group. Apparently, the problem is related to the Build command in BCB6. The project group compiles with Make All Projects, but not with Build All Projects.


3.3 Xerces Examples

There are two predominant models for parsing XML documents: SAX and DOM. SAX is an event driven model, whereas DOM is a tree based object model.

DOM Parsers read the entire XML document and return an in memory tree structure that represents the content of the document. This tree structure resembles a parse tree. After parsing a document, you can navigate the tree structure to read values from the document.

SAX parsers operate differently. They parse the document element by element. Each token in the XML file generates an event. To read values from the XML document, you must hook these events.

Xerces supports DOM 2.0 and SAX 2.0. Each model has strengths and weaknesses. The list below highlights some of the key differences between the two.

  • The DOM and SAX specifications both consist of interfaces.
  • Xerces provides C++ classes that implement all of the DOM interfaces. To parse a document with DOM, you simply use the Xerces DOM classes.
  • Xerces provides C++ classes that implement many of the SAX interfaces. However, to parse a document with SAX, you typically have to implement one of those interfaces yourself (ContentHandler).
  • The DOM parser processes the entire XML document before control is returned to your code. The DOM parser returns a Document object that represents the entire tree structure of the XML document.
  • The SAX parser calls methods of your custom ContentHandler as it parses the XML document.
  • The DOM parser is typically easier to use, but its use incurs a greater burden on the system because the entire tree structure resides in memory.
  • SAX parsers require more work on your part, but they consume less memory than DOM parsers.

Tip Note:

DOM is a specification that is governed by W3C. SAX is technically not a standard. It is controlled by a group of open source volunteers and is not affiliated with W3C.


3.3.1 DOM Parser

Reading an XML document with the Xerces DOM parser consists of several steps:

  1. Call XMLPlatformUtils::Initialize() to initialized the Xerces engine
  2. Create a DOMParser object.
  3. Configure the DOMParser by calling its member functions.
  4. Call the DOMParser::parse to parse an XML file
  5. Retrieve the DOM document object by calling DOMParser::getDocument()
  6. Work with the DOM document object as needed.
  7. Call XMLPlatformUtils::Terminate() to shutdown the Xerces engine.

Listing 3.1 shows a minimal DOM parsing example.

//-----------------------------------------------------------------------------
//Listing 3.1: dom-minimal/main.cpp
#pragma hdrstop

#include <iostream>
#include <util/PlatformUtils.hpp>
#include <parsers/DOMParser.hpp>
#include <dom/DOM.hpp>
using namespace std;

int main()
{
    // NOTE: error handling has been omitted from this example. Typically,
    //       Initialize and parse should be surrounded by try catch blocks.

    XMLPlatformUtils::Initialize();
    try
    {
        DOMParser parser;
        parser.setValidationScheme(DOMParser::Val_Auto);
        parser.setIncludeIgnorableWhitespace(false);
        parser.parse("test.xml");

        DOM_Document doc = parser.getDocument();
        // Add code that works with the DOM_Document.
    }
    catch(...)
    {
        cout << "An error occurred" << endl;
    }

    XMLPlatformUtils::Terminate();
    return 0;
}
//-----------------------------------------------------------------------------

Listing 3.1 demonstrates how to initialize the Xerces engine, create a DOM parsing object, and how to parse an XML file. However, it doesn't show how to interact with the DOM tree once the parser is finished. Listing 3.2 contains a more complete example. This example is a GUI project that walks the DOM tree and adds each DOM node to a TTreeView control. Once the iteration is complete, the tree control resembles the structure of the in-memory DOM tree.

//-----------------------------------------------------------------------------
//Listing 3.2: dom-treewalker/dommain.cpp
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop

#include "dommain.h"

#include <util/PlatformUtils.hpp>
#include <parsers/DOMParser.hpp>
#include <dom/DOM.hpp>

#include <string>
#include <sstream>
using namespace std;

//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------

// DOM string utilities
std::string DOMStringToStdString(const DOMString& s);
AnsiString DOMStringToAnsiString(const DOMString& s);
ostream& operator<< (ostream& target, const DOMString& s);

//DOM tree walking routines
void WalkTreeItem(TTreeNodes *nodes, TTreeNode *parent,
                  const DOM_Node &domnode);
void WalkTree(TTreeView *tree, DOM_Node &node);

__fastcall TForm1::TForm1(TComponent* Owner)
    : TForm(Owner)
{
    try
    {
        XMLPlatformUtils::Initialize();
    }
    catch(const XMLException& toCatch)
    {
        ShowMessage("Error during Xerces-c Initialization.\n");
        Application->Terminate();
    }
}

__fastcall TForm1::~TForm1()
{
    XMLPlatformUtils::Terminate();
}

//---------------------------------------------------------------------------
void __fastcall TForm1::BrowseButtonClick(TObject *Sender)
{
    if(ExtractFileExt(FileNameEdit->Text).Length() !=0)
    {
        OpenDialog->InitialDir = "";
        OpenDialog->FileName   = FileNameEdit->Text;
    }
    else
    {
        OpenDialog->FileName   = "";
        OpenDialog->InitialDir = FileNameEdit->Text;;
    }
    if(OpenDialog->Execute())
        FileNameEdit->Text = OpenDialog->FileName;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::ParseButtonClick(TObject *Sender)
{
    AnsiString xmlfile = FileNameEdit->Text;
    if(!FileExists(xmlfile))
    {
        ShowMessage("File does not exists.");
        return ;
    }

    Memo1->Lines->LoadFromFile(xmlfile);
    DOMParser parser;
    parser.setValidationScheme(DOMParser::Val_Auto);
    parser.setIncludeIgnorableWhitespace(false);

    parser.parse(xmlfile.c_str());  // note: ignoring exceptions, should catch
                                    // XMLException and DOM_DOMException types

    // Walk the DOM document tree and put each node into the treeview
    DOM_Document doc = parser.getDocument();
    WalkTree(TreeView1, doc);
}

//---------------------------------------------------------------------------
std::string DOMStringToStdString(const DOMString& s)
{
    // note: this would be a good place to use boost::scoped_array
    char *p = s.transcode();
    std::string result (p);
    delete [] p;
    return result;
}

AnsiString DOMStringToAnsiString(const DOMString& s)
{
    // note: this would be a good place to use boost::scoped_array
    char *p = s.transcode();
    AnsiString result (p);
    delete [] p;
    return result;
}

ostream& operator<< (ostream& target, const DOMString& s)
{
    target << DOMStringToStdString(s);
    return target;
}

void WalkTreeItem(TTreeNodes *nodes, TTreeNode *parent,
                  const DOM_Node &domnode)
{
    AnsiString  nodeName  = DOMStringToAnsiString(domnode.getNodeName());
    AnsiString  nodeValue = DOMStringToAnsiString(domnode.getNodeValue());

    switch(domnode.getNodeType())
    {
        case DOM_Node::TEXT_NODE:
        case DOM_Node::ATTRIBUTE_NODE:
        {
            if(!nodeValue.IsEmpty())
            {
                TTreeNode * newnode = nodes->AddChild(parent, nodeName +
                                      " : " + nodeValue);
                DOM_Node child = domnode.getFirstChild();
                while( child != 0)
                {
                    WalkTreeItem(nodes, newnode, child);
                    child = child.getNextSibling();
                }
            }
            break;
        }

        case DOM_Node::ELEMENT_NODE :
        {
            TTreeNode * newnode = nodes->AddChild(parent, nodeName +
                                  " : " + nodeValue);
            DOM_NamedNodeMap attrs = domnode.getAttributes();
            for (size_t i = 0; i < attrs.getLength(); ++i)
            {
                WalkTreeItem(nodes, newnode, attrs.item(i));
            }

            DOM_Node child = domnode.getFirstChild();
            while( child != 0)
            {
                WalkTreeItem(nodes, newnode, child);
                child = child.getNextSibling();
            }
            break;
        }

        case DOM_Node::DOCUMENT_NODE :
        case DOM_Node::XML_DECL_NODE:
        case DOM_Node::COMMENT_NODE:
        {
            TTreeNode * newnode = nodes->AddChild(parent, nodeName +
                                  " : " + nodeValue);

            DOM_Node child = domnode.getFirstChild();
            while( child != 0)
            {
                WalkTreeItem(nodes, newnode, child);
                child = child.getNextSibling();
            }
            break;
        }

        // Ignore any other type of nodes
        case DOM_Node::DOCUMENT_TYPE_NODE:
        case DOM_Node::PROCESSING_INSTRUCTION_NODE :
        case DOM_Node::ENTITY_REFERENCE_NODE:
        case DOM_Node::CDATA_SECTION_NODE:
        case DOM_Node::ENTITY_NODE:
            break;

        default:
            throw EInvalidOperation("Unrecognized node type");
    }
}

void WalkTree(TTreeView *tree, DOM_Node &node)
{
    tree ->Items->BeginUpdate();
    tree ->Items->Clear();
    WalkTreeItem(tree->Items, 0, node);
    tree ->FullExpand();
    tree ->Items->EndUpdate();
}
//-----------------------------------------------------------------------------

This example contains the same initialization and parsing code that Listing 3.1 had. It also contains a function called WalkTree that populates a TTreeView control from the contents of a DOM_Node object. WalkTree calls another function called WalkTreeItem. This function actually performs the iteration by recursively calling itself for DOM nodes that have children.

There are a couple of points worth mentioning about the code in Listing 3.2. First, note that the Xerces library relies heavily on its own string class, DOMString. Xerces does not provide a mechanism for converting DOMString objects to std::string or AnsiString. You have to provide this conversion yourself in the form of helper routines. Listing 3.2 contains two helper functions for converting DOMString objects: DOMStringToStdString and DOMStringToAnsiString.

Secondly, notice the large switch block in WalkTreeItem. The DOM model represents nodes from the XML file as DOM_Node objects. There are many different types of DOM nodes. The root document object is a DOM node. Elements and attributes are also DOM nodes. The different node types all inherit from the DOM_Node base class. The getNodeType method of DOM_Node returns and enum value that indicates what type of node the object really is. The switch block reads the enum value to determine how the node should be added to the tree.

3.3.2 SAX Parser

Reading an XML document with the Xerces SAX parser is similar to using the DOM parser. The key difference is that you need to implement a ContentHandler and pass an instance of your handler to the parser. Here are the steps:

  1. Create a class that implements the ContentHandler interface. The class DefaultHandler makes a good base class because it implements a number of other interfaces, including the ErrorHandler interface.
  2. Call XMLPlatformUtils::Initialize() to initialized the Xerces engine
  3. Create a SAX2XMLReader object by calling the factory function XMLReaderFactory::createXMLReader. This result should be stored in a smart pointer of some kind (it is your responsibility to delete this object).
  4. Create an instance of your custom handler class (the class from step 1)
  5. Configure your handler object to be the content handler for the parser by calling SAX2XMLReader::setContentHandler
  6. Call the parse method of the SAX2XMLReader object.
  7. Call XMLPlatformUtils::Terminate() to shutdown the Xerces engine.

Listing 3.3 contains a simple SAX parser.

//------------------------------------------------------------------------------
//Listing 3.3: sax-minimal/main.cpp
#pragma hdrstop

#include <iostream>
#include <memory>
using namespace std;

#include <util/PlatformUtils.hpp>
#include <sax2/SAX2XMLReader.hpp>
#include <sax2/XMLReaderFactory.hpp>
#include <sax2/DefaultHandler.hpp>

class SAX2Handler : public DefaultHandler // DefaultHandler inherits from
{                                         // ContentHandler and ErrorHandler
public:
    SAX2Handler()  {}
    ~SAX2Handler() {}

    //  Overrides for ContentHandler interface
    virtual void startElement(const XMLCh* const uri,
                              const XMLCh* const localname,
                              const XMLCh* const qname,
                              const Attributes& attrs);
    virtual void endElement(const XMLCh* const uri,
                            const XMLCh* const localname,
                            const XMLCh* const qname);

    virtual void characters(const XMLCh* const chars,
                            const unsigned int length);
    virtual void startDocument();
    virtual void endDocument();

    //  Overrides for ErrorHandler interface
	void warning(const SAXParseException& exception);
    void error(const SAXParseException& exception);
    void fatalError(const SAXParseException& exception);
};

void SAX2Handler::startElement(const XMLCh* const uri,
                               const XMLCh* const localname,
                               const XMLCh* const qname,
                               const Attributes& attrs)
{
    wcout << "startElement: "<< uri << ',' << localname << ','
          << qname << endl;
}

void SAX2Handler::endElement(const XMLCh* const uri,
                             const XMLCh* const localname,
                             const XMLCh* const qname)
{
    wcout << "endElement: "<< uri << ',' << localname << ','
          << qname << endl;
}

void SAX2Handler::characters(const XMLCh* const chars,
                             const unsigned int length)
{
    wcout << "characters: "<< chars << ',' << length << endl;
}

void SAX2Handler::startDocument()
{
    cout << __FUNC__ << endl;
}

void SAX2Handler::endDocument()
{
    cout << __FUNC__ << endl;
}

void SAX2Handler::warning(const SAXParseException& exception)
{
    wcout << __FUNC__ << " : " << exception.getLineNumber() << ','
          << exception.getColumnNumber() << " : " << exception.getMessage()
          << endl;
}

void SAX2Handler::error(const SAXParseException& exception)
{
    wcout << __FUNC__ << " : " << exception.getLineNumber() << ','
          << exception.getColumnNumber() << " : " << exception.getMessage()
          << endl;
}

void SAX2Handler::fatalError(const SAXParseException& exception)
{
    wcout << __FUNC__ << " : " << exception.getLineNumber() << ','
          << exception.getColumnNumber() << " : " << exception.getMessage()
          << endl;
}

int main()
{
    try
    {
        XMLPlatformUtils::Initialize();
        auto_ptr<SAX2XMLReader> parser (XMLReaderFactory::createXMLReader());

        // Create a handler object and install it as the content handler and
        // as the error handler.
        SAX2Handler handler;
        parser->setContentHandler(&handler);
        parser->setErrorHandler(&handler);
        parser->parse("test.xml");
    }
    catch (const XMLException& e)
    {
        wcout << "An error occurred : " << e.getType() << endl
              << e.getMessage() <<  endl;
    }

    // And call the termination method
    XMLPlatformUtils::Terminate();
}
//------------------------------------------------------------------------------

Tip Note:

At the time of this writing, if you compile and run Listing 3.3 with STLport selected as your STL implementation, the program will appear to do nothing. This is because wcout is essentially broken in BCB6 when using STLport. None of the wcout statements print to the console. wcout just seems to do nothing.

If you globally define the value _USE_OLD_RW_STL, then the program works fine. Hopefully, Borland provide a patch in the near future that addresses this bug.


Tip Note:
Xerces includes a number of example projects that you might want to look at. You can compile the examples from the same project group that you use to build the Xerces DLL.

3.4 Links to Xerces resources

Xerces links
  • Xerces C++ home page
  • - http://xml.apache.org/xerces-c/index.html
  • Xerces C++ FAQs
  • - http://xml.apache.org/xerces-c/faqs.html
  • Xerces stable download location
  • - http://xml.apache.org/dist/xerces-c/stable/
  • Xerces class hierarchy
  • - http://xml.apache.org/xerces-c/apiDocs/hierarchy.html

    XML links
  • FAQs at xml.org
  • - http://www.xml.org/xml/xmlfaq.shtml
  • DOM homepage at w3c.org
  • - http://www.w3.org/DOM/
  • DOM FAQs
  • - http://www.w3.org/DOM/faq
  • DOM 2 specification
  • - http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/
  • SAX homepage
  • - http://www.saxproject.org/
  • XML resource page
  • - http://www.computer.org/internet/xml/index.htm

    Books
  • XML in a Nutshell
  • - http://www.amazon.com/exec/obidos/ASIN/0596000588/
  • Essential XML: Quick Reference
  • - http://www.amazon.com/exec/obidos/ASIN/0201740958/

    4: ACE and TAO


    4.1 Introduction

    ACE is a C++ library for building distributed network systems. It includes classes for communicating through a socket connection, creating threads, working with memory mapped files, and a variety of other tasks. One of the key benefits of ACE is the fact that it is cross platform. ACE is written in C and C++, and has been ported to a variety of platforms, including Windows, Linux, Unix, and embedded operating systems such as VxWorks. To achieve this portability, ACE utilizes a layered approach. ACE employs a facade wrapper that encapsulates OS specific calls. As long as you stick to the facade wrapper, your code should port easily to other platforms.

    TAO is a CORBA orb that is built on top of ACE. Like ACE, it is open source, free, and cross platform. TAO is a very attractive choice when compared with expensive orbs such as Iona's Orbix and Borland's own Visibroker.

    4.2 Installing ACE and TAO

    Instructions for building ACE and TAO with C++Builder can be found at the following two links:

    Both ACE and TAO provide makefiles for building the libraries with Borland C++Builder. These makefiles work fine with BCB6. However, the instructions do not discuss how to configure the BCB6 IDE for ACE and TAO. To configure BCB6, follow the instructions below. Note that these instructions duplicate much of the information from the two links above.

    1. Download ACE and TAO (http://deuce.doc.wustl.edu/Download.html)
    2. Extract the ZIP file to a suitable location (e:\code\lib\ace)
    3. Create a file in the ace_wrappers\ace directory called config.h. Add a single #include statement to this file:
      #include "ace/config-win32.h"
    4. Open a console window and navigate to the ace_wrappers\ace directory.
    5. Create an environment variable called ACE_ROOT. This environment variable is used to build the ACE and TAO libraries. Set its value to the ace_wrappers directory:
       
      set ACE_ROOT=e:\code\lib\ace\ace_wrappers
       
    6. Build the ACE library by running make:
       
      make -f makefile.bor
       
    7. Install the ACE libraries by running make again. This time, you need to pass some extra arguments to make:
       
      make -f makefile.bor -DINSTALL_DIR=e:\code\lib\ace install
       
      I have chosen to install to e:\code\lib\ace, but you may want to pick a different location. The makefile allows you to specify the install directory. It will then create subdirectories in that location for holding the ACE files. For example, the make file will create the following directory structure if you install to c:\acetao like the ACE documentation suggests:
      c:\acetao
        -> bin
           - ace_*.dll
           - ace_*.tds
        -> include
           -> ace
              - (ace include files here)
        -> lib
           - ace*.lib (a total of 3 LIB files)
      
      You need to decide where to put these files. The include and lib directories will eventually be added to your BCB project options. The DLLs in the bin directory will need to be on your system path. Here are some choices:
      1. Install to c:\acetao, like the ACE documentation says. You will need to manually add c:\acetao\bin to the system path.
      2. Install to the BCB installation directory. You won't need to modify your path because $(BCB)\bin is already on your path.
      3. Install to some other location, but copy the DLLs to system32, $(BCB)\bin, or $(BCB)\projects\bpl.
      I have chosen the third option by installing the ACE files in e:\code\lib\ace and manually copying the DLLs to $(BCB)\projects\bpl.
    8. Change directories to ace_wrappers\tao
    9. Run make to build the TAO library.
       
      make -f makefile.bor
       
    10. Install the TAO include files, library files, and DLLs using the same command from step 7:
       
      make -f makefile.bor -DINSTALL_DIR=e:\code\lib\ace install
       
    11. Ensure that the ACE and TAO DLLs are on the system path. You can do this by either moving the DLLs to a different directory, or by adding the bin directory to the system path (ie e:\code\lib\ace\bin).
    12. Launch BCB and open Tools-Environment Options. On the Environment Variables tab, create a new environment variable by clicking New. Set the variable name to ACE_TAO and set the value to the same directory that you passed to make in steps 7 and 10 (ie e:\code\lib\ace).

    At this point, ACE and TAO will be installed on your system. To use ACE or TAO in a BCB project, modify your include and library paths so they point to the ACE directories. Add $(ACE_TAO)\include to your include path and add $(ACE_TAO)\lib to your library path. You will also need to add the correct LIB files to your project (ACE_bp.lib and TAO_bp.lib at a minimum).

    4.3 ACE and TAO examples

    4.3.1 A simple ACE socket client

    ACE provides a wide variety of services that you can utilize from your C++ projects. One of the core ACE services is a set of platform independent, socket wrapper classes. The ACE socket wrappers act as a facade that encapsulates OS specific socket calls. The ACE socket facade shields your application from inconsistencies between the Win32 socket API and the Unix socket API.

    Listing 4.1 contains a simple socket client built using the ACE framework.

    //------------------------------------------------------------------------------
    //Listing 4.1: ace-client/main.cpp
    #pragma hdrstop
    
    #include <cstring>
    #include <iostream>
    using namespace std;
    
    #include "ace/SOCK_connector.h"
    #include "ace/SOCK_stream.h"
    #include "ace/INET_addr.h"
    
    #pragma argsused
    int main(int argc, char* argv[])
    {
        ACE_SOCK_Connector connector;
        ACE_SOCK_Stream    stream;
        ACE_INET_Addr      address;
    
        // connect to the web server
        if(address.set(80,"www.bcbdev.com") == -1)
            return 1;
        if(connector.connect(stream, address) == -1)
            return 1;
    
        // perform an http get
        const char* message = "GET /index.html HTTP/1.0\r\n\r\n";
        stream.send_n(message, strlen(message));
    
        ssize_t count=0;
        const size_t BUFSIZE=4096;
        char buff[BUFSIZE];
    
        while( (count=stream.recv(buff, BUFSIZE)) > 0)
        {
            cout.write(buff, count);
        }
    
        return stream.close();
    
    }
    //------------------------------------------------------------------------------

    There are three ACE classes involved in this example: ACE_SOCK_Connector, ACE_SOCK_Stream, and ACE_INET_Addr. The ACE_SOCK_Connector class establishes a connection with a remove server. ACE_INET_Addr abstracts the concept of an internet address and port. Many of the other ACE classes rely on ACE_INIT_Addr. Lastly, the ACE_SOCK_Stream class represents that actual data that is exchanged between the server and the client. You can send or receive data with an instance of ACE_SOCK_Stream, provided that you have first established a connection.

    Tip Note:

    There is a bit of overlap between ACE and the INDY components that come with BCB6. Both libraries allow you to open a socket connection between a client and a server. However, the similarities pretty much end right there. The bullet lists below highlight some of the pros and cons of the INDY library.

    INDY Pros
    • Comes in the form of VCL components (probably easier to use)
    • Ships with C++Builder 6, no special installation steps
    • Provides higher level protocol components such as HTTP, FTP, SMTP, etc
    INDY Cons
    • Less mature and less stable than ACE
    • Written in Object Pascal
    • Arguably not as well designed as ACE
    • Doesn't abstract pipes or memory mapped files
    • Written in Object Pascal
    • Weaker threading support
    • No CORBA orb
    • Written in Object Pascal
    • Not available on as many platforms as ACE
    • Written in Object Pascal

    For the most part, INDY and ACE solve different problems with a small amount of overlap. If your application needs to send an email or retrieve a file from an FTP or HTTP server, then INDY is probably the best way to go. If you need to build a complex, distributed system from the ground-up, then ACE is probably the better tool. If portability is a concern, then ACE is the clear front runner.


    4.3.2 An ACE threading example

    ACE also provides classes that help you create multithreaded applications. In fact, it offers a variety of classes, ranging from simple threading to advanced thread pool management. Listing 4.2 contains a simple example of the ACE threading capabilities.

    //------------------------------------------------------------------------------
    //Listing 4.2: ace-thread/main.cpp
    #pragma hdrstop
    
    #include <cstdlib>
    #include <iostream>
    using namespace std;
    
    #include "ace/Thread_Manager.h"
    #include "ace/Synch.h"
    
    int seed_value=0;
    #pragma argsused
    void * thread_func(void *arg)
    {
        ACE_DEBUG((LM_DEBUG,"Thread %t started.\n"));
        ACE_OS::srand(seed_value++);
        int loop_count = rand()%10;
        int delay;
        for(int j=1; j<=loop_count; ++j)
        {
           delay = ACE_OS::rand() % 4;
           ACE_DEBUG((LM_DEBUG,"Thread %t sleeping for %d seconds : %T\n",delay));
           ACE_OS::sleep(delay);
    
           delay = ACE_OS::rand() % 4;
           ACE_DEBUG((LM_DEBUG,"Thread %t awake for %d seconds    : %T\n",delay));
           ACE_Time_Value timeout(ACE_OS::gettimeofday());
           timeout += delay;
           while(ACE_OS::gettimeofday() < timeout)
             ;
        }
    
        ACE_DEBUG((LM_DEBUG," - Thread %t shutting down.\n"));
        return 0;
    }
    
    #pragma argsused
    int main(int argc, char* argv[])
    {
        const int thread_count=2;
        ACE_Thread_Manager::instance()->spawn_n(thread_count,
                                               (ACE_THR_FUNC)thread_func);
        ACE_Thread_Manager::instance()->wait();
        ACE_DEBUG((LM_DEBUG,"All threads have finished.\nShutting down.\n"));
        return 0;
    }
    //------------------------------------------------------------------------------

    This example spawns 2 separate threads using the ACE_Thread_Manager class. The function thread_func acts as the main thread routine.

    Notice how the example code uses the OS wrapper facades in ACE_OS. Instead of calling an OS specific routine to make the thread sleep, the code calls the wrapper facade. This ensures portability because ACE_OS::sleep encapsulates and hides the OS specific routine.

    4.3.3 A simple TAO CORBA server and client

    TAO includes a number of CORBA example programs ($(ACE_TAO)\ACE_wrappers\TAO\examples). You can compile the examples from the command line using make. However, it is also beneficial to see how to setup a TAO project in the IDE. The archive for this article includes the TAO time server example, complete with BCB6 project files.

    To create TAO BCB projects from scratch, follow these instructions.

    1. First, compile your IDL files with the TAO IDL compiler tao_idl.exe. This executable is created when you build TAO. Look for it in the $(ACE_TAO)\bin\Dynamic\Debug directory. For this example, the command to compile the IDL file is:
      e:\code\lib\ace\bin\Dynamic\Debug\tao_idl -Sc -Ge 1 Time.idl
      Replace the path as needed on your system. You will probably want to create a batch file for compiling the IDL.
    2. Creating the server
      1. Create a new, multithreaded console mode project that will act as the server. Add the IDL generated files to your project (TimeS.cpp and TimeC.cpp in this case).
      2. Add CORBA object implementations to your project (see Time_i.cpp and Time_i.h in this example)
      3. Add code to the server's main function that creates an instance of the CORBA object and performs any other necessary TAO housekeeping chores (see server.cpp)
      4. Add $(ACE_TAO)\include to the include path for the project
      5. Add $(ACE_TAO)\lib to the library path for the project
      6. Add any necessary ACE and TAO libraries to your project. In this example, the following LIB files were linked:
        • ace_bd.lib
        • tao_bd.lib
        • TAO_IORTable_bd.lib
        • TAO_PortableServer_bd.lib
        • TAO_Svc_Utils_bd.lib
        • TAO_CosNaming_bd.lib
      7. Build the server
    3. Creating the client
      1. Create a new, multithreaded console mode project for the client. Add the IDL generated, client stub files to your project (TimeC.cpp in this case).
      2. Add code to the client that connects to the CORBA server objects and executes their member functions (see Client.cpp, Time_Client_i.h, and Time_Client_i.cpp)
      3. Add $(ACE_TAO)\include to the include path for the project
      4. Add $(ACE_TAO)\lib to the library path for the project
      5. Add any necessary ACE and TAO libraries to your project. In this example, the following LIB files were linked:
        • ace_bd.lib
        • tao_bd.lib
        • TAO_IORTable_bd.lib
        • TAO_PortableServer_bd.lib
        • TAO_Svc_Utils_bd.lib
        • TAO_CosNaming_bd.lib
      6. Build the client

    Note that configuring a server IDE project is largely the same as those for creating a client. Also notice how vital environment variables are. By using environment variables, you can create an IDE project that is flexible with respect to where ACE and TAO reside. Without them, you would have to rely on hardcoded directory names.

    Tip Note:

    All ACE and TAO projects must link with the multithreaded runtime library. When you run the console wizard, make sure that you check the multithreaded checkbox.


    4.4 ACE and TAO resources

    In this article, we discussed how to install and configure ACE and TAO to work with C++Builder 6.0. However, we have not even begun to scratch the surface of what ACE and TAO are capable of. If you are interested in ACE or TAO, check out the following links and books.

  • ACE Homepage
  • - http://www.cs.wustl.edu/~schmidt/ACE.html
  • Installing ACE with C++Builder
  • - http://www.cs.wustl.edu/~schmidt/ACE_wrappers/ACE-INSTALL.html#borland
  • ACE Download page
  • - http://deuce.doc.wustl.edu/Download.html
  • ACE 5.2 + TAO 1.2 ZIP file
  • - http://deuce.doc.wustl.edu/ACE-5.2+TAO-1.2.zip
  • TAO Homepage
  • - http://www.cs.wustl.edu/~schmidt/TAO.html
  • TAO with C++Builder (Chris Kohloff)
  • - http://www.tenermerx.com/programming/corba/tao_bcb/index.html

    Books
  • C++ Network Programming:
    Mastering Complexity with ACE and Patterns
  • - http://www.amazon.com/exec/obidos/ASIN/0201604647/
  • Pattern-Oriented Software Architecture, Vol 2
  • - http://www.amazon.com/exec/obidos/ASIN/0471606952/
  • Advanced CORBA Programming with C++
  • - http://www.amazon.com/exec/obidos/ASIN/0201379279/
  • TCP/IP Illustrated, Vol 1
  • - http://www.amazon.com/exec/obidos/ASIN/0201633469/


    5: Flex and Bison (Lex and Yacc replacements)


    5.1 Introduction

    Flex and Bison are Lex and Yacc compatible replacements (they are not totally compatible, but close). Flex is a lexical analyzer and Bison is a parser. Both are open source tools that are distributed by the Free Software Foundation as part of the GNU project (gnu.org). Bison falls under GPL licensing, whereas Flex is licensed under a less restrictive BSD license (software from gnu.org usually falls under the GPL, but some programs, such as Flex, use a different license).

    Both Flex and Bison generate C and C++ source code. They take a configuration file and create C and C++ source files that you add to your project. The resulting C and C++ source code can be compiled and linked in your BCB projects.

    Tip Note:

    Although the Free Software Foundation distributes Flex and Bison, you do not have to release your source code or the code that the tools generate under the GPL or LGPL licenses.


    5.2 Installation

    There are many ways to obtain Flex and Bison. You can download from the GNU website (see http://www.gnu.org/software/flex/ and http://www.gnu.org/software/bison/). However, if you download from the GNU website, you will have to build Flex and Bison from the source code. While this isn't that big of deal, building Bison typically requires that you have a Unix like environment, such as Cygwin. There are easier ways to obtain Flex and Bison such that you don't have to compile them from scratch. One easy is to download the pre-compiled Cygwin version Flex and Bison.

    Section 5.2.1 and 5.2.2 describe how to install Cygwin. However, to save time, the Flex and Bison binary executables are included with this article (available on the conference CD and at http://www.bcbdev.com/ftp/source/flex-bison.zip). Both executables were compiled with the Cygwin GNU C compiler. Bison was compiled without any changes, but the Flex program was recompiled with a Borland compatible version of flex.skl. See section 5.2.2 below for more details.

    In order to compile the examples from this article with the least amount of effort, you should install the supplied versions of Flex and Bison. To install them:

    • Copy flex.exe, bison.exe, bison.simple, bison.hairy, and cygwin1.dll to a local directory that is on your path. $(BCB)\bin is a good choice.
    • Copy flexlexer.h to your $(BCB)\include directory.
    • Consult the readme.txt file for information regarding copyrights and licensing.
    • For reference, the Borland compatible version of flex.skl is also supplied.

    After copying the Flex and Bison binaries, the next step is configuring them as build tools in the BCB IDE. Section 5.2.3 describes this process. If you have installed the supplied binary versions of Flex and Bison, you can skip to Section 5.2.3.

    5.2.1 Installing Flex and Bison with Cygwin

    Cygwin is a set of tools that combine to form a Unix like environment on a Windows PC. The Cygwin project is maintained by Red Hat. Cygwin includes sed, awk, grep, GNU C++, gzip, tar, less, vi, and a bash shell. It also includes Flex and Bison. Although Unix software has a notorious reputation for being difficult to install, Cygwin provides a setup program that makes it almost painless. To install Cygwin:

    1. Go to http://www.cygwin.com and click the Install Cygwin Now link (or navigate directly to http://www.cygwin.com/setup.exe). When your browser asks you to Open or Save the setup program, choose Save, and save the file to a new directory for Cygwin (ie c:\cygwin).
    2. Launch the Cygwin setup.exe program from step 1 and step through the wizard.
    3. The second screen of the installer asks you how you want to install Cygwin. The choices are:
      • Install from internet (downloads and extracts gz2 files, but does not save the archive after extraction)
      • Download from internet (download gz2 files to your hard drive, but does not extract them)
      • Install from local directory (extracts previously downloaded archives)
      If you have not installed Cygwin previously, the easiest solution is the first option. Select this option and click next.
    4. The next page asks where you want to install Cygwin. Enter the same path from step 1 (ie c:\cygwin). Enter the other options as you see fit and then proceed to the next page.
    5. Next, the setup program will ask you to enter a local package directory. The default value is fine, so just click next.
    6. The next screen of the wizard asks you to enter connection information about your network. Enter the information for your system and click next.
    7. On the next screen, select a suitable FTP site and click next.
    8. After choosing an FTP site, Cygwin will download some configuration information. When it finishes, a screen will appear where you can choose which Cygwin packages you want to install. I recommend that you install all of them, plus the source code for Flex and Bison. At a minimum, you should install Flex and Bison, which reside in the Devel node in the tree. Note that the default action on the package selection screen is to install only the base set of Unix utilities. This default configuration will not install Flex or Bison. Click Next after you have finished selecting packages.
    9. At this point, the setup program begins downloading and installing the Cygwin packages. When the download is complete, finish the wizard.
    10. Copy the Flex header file FlexLexer.h from the c:\cygwin\usr\include directory to $(BCB)\include

    When you launch the Cygwin Bash shell, all of the Cygwin tools reside on the system path in that shell instance. However, Cygwin does not add its tools to the Windows path. We can use Flex and Bison from the BCB IDE without having them on the system path. However, if you want to perform command line builds with make.exe, you may want to create a batch file that adds the Cygwin tools to the Windows path. Another option would be simply copy flex.exe and bison.exe to an existing directory that is already on the path, such as $(BCB)\bin. If you do this, make sure you also copy cygwin1.dll and cygintl-1.dllto the system path.

    Tip Note:

    If you forget to install something while running the Cygwin setup program, you can always install it by running setup again. The Cygwin installer remembers which packages you have installed, and it allows you to easily add more programs whenever you want.


    5.2.2 Patching Flex

    Tip Note:

    The version of Flex.exe on the conference CD and on the website has already been patched. You do not have to follow these steps if you installed the supplied version of Flex.


    Regardless of how you obtain Flex and Bison, you will eventually discover a small problem. Flex generates source code that is not compatible with modern C++ compilers. The main issue involves the "-+" command line option for Flex, which tells Flex to generate a lexer that works with C++ iostreams. The current version of Flex (2.5.4), creates a C++ source file that contains a forward declaration for istream. This causes problems because istream is not a class anymore, it is a typedef for basic_istream<char>.

    This isn't the only problem with Flex. Flex generates code that relies on a Unix/Linux header file called unistd.h. C++Builder does not provide this header file. The closest replacement is io.h. Flex also creates prototypes for the isatty function. Unfortunately, these prototypes clash with existing prototypes for the same function.

    In order to utilize Flex effectively, a few small changes need to be made. Fortunately, Flex is an open source library, so we can alter the way it generates source code. Not only is Flex open source, it is also well designed. We don't have to hunt through the Flex source code in order to make these changes. We simply change a configuration file that governs how Flex creates source code. This file is called flex.skl, and is part of the Flex source distribution.

    flex.skl governs how Flex generates code. To remove the include for unistd.h and forward declaration for istream, we simply modify flex.skl and rebuild the executable. The ZIP file for this article contains a patched version of flex.skl. The file README.TXT describes the changes that were made to the file. To patch Flex, extract flex.skl to your local directory that contains the source code for Flex. You should overwrite an existing file of the same name (you may want to create a backup of the original first). After copying the file, rebuild Flex. The bullet list below describes how to rebuild Flex using Cygwin.

    1. Start the Cygwin bash shell.
    2. Change directories to /usr/src/flex-2.5.4-2. The exact directory may vary based on the current version of Flex. If a Flex source directory does not exist, run the Cygwin setup program again and install the Flex source code.
    3. Execute ./configure. This creates a makefile.
    4. Run make to rebuild Flex.
    5. Copy flex.exe to where the old version resides (/bin in Cygwin).

    5.2.3 Configuring Flex and Bison as Build Tools in BCB6

    Now that you have Flex and Bison installed on your system, it is time to put the tools to use. Recall that Flex and Bison process an input file and create C and C++ source files as their output. You can utilize the source output in several ways:

    1. You can run Flex and Bison separately from the command line and add the resulting source files to your BCB project. After this point, you would only run Flex and Bison as needed.
    2. You could create a make file and perform command line builds. The make file would contain rules for compiling the Flex and Bison config files to source code and compiling that source code to OBJ form with bcc32.
    3. You can configure Flex and Bison as build tools in the BCB6 IDE. This allows you to add Flex and Bison files to your IDE projects.

    The first option is adequate if you have an existing lexer or parser configuration that is unlikely to change very often. However, creating a parser is generally a trial and error process. If you have to change the config files often, then option 1 is going to grow old really fast. The second option is an excellent choice if you are familiar with make files. The downfall to using make files is that it can be difficult to debug your projects.

    This section describes how to utilize option 3. C++Builder 6 includes new support for configuring external build tools, such as Flex and Bison. After configuring the tools, you can add Flex and Bison config files directly to your IDE based projects.

    The steps below describe how to configure Flex and Bison as build tools:

    Configuring Flex
    1. Start BCB6.
    2. Open Tools-Build Tools and click Add
    3. Fill in the Edit Tool dialog
      1. Enter Flex Compiler as the title.
      2. In the Default Extensions box, enter *.flex;*.l (lower case L). This entry tells BCB that Flex files have either .flex or .l as their file extension. You can modify this list of extensions if you want to follow a different convention.
      3. Leave Other Extensions box blank.
      4. Set the Target Extension to cpp (no punctuation).
      5. The entry for the Command Line box will differ based on where flex.exe resides. If it is on your Windows path, enter:
        flex.exe -+ -o$TARGETNAME $PATH$NAME $SAVE
        If flex.exe is not on your path, then add the absolute path in front of flex.exe.
      6. Click Ok to close the box.
    Configuring Bison
    1. Open Tools-Build Tools and click Add
    2. Fill in the Edit Tool dialog
      1. Enter Bison Compiler as the title
      2. Enter *.bison;*.y as the default extensions.
      3. Set the Target Extension to cpp (no punctuation)
      4. In the Command Line box, enter will differ based on where flex.exe resides. If it is on your Windows path, enter bison.exe -d -o$TARGETNAME $PATH$NAME $SAVE. Precede the program name with a path if Bison is not already on the Windows path.
      5. Click Ok to close the box.
    3. Bison relies on two external files called bison.hairy and bison.simple. You need to tell Bison how to find these files. There are several ways to do this. In BCB6, the easiest way is to create two environment variables that Bison looks for. The variables are BISON_SIMPLE and BISON_HAIRY. Open Tools-Environment Options and go to the Environment Variables tab. Click the New button to create the environment variables (see Figure 2).

    Figure 1. Configuring Flex as a Build Tool


     


    Figure 2. Bison Environment Variables


     

    Tip Note:

    The environment variables that you add from the Environment Options dialog are local to the BCB IDE and any process that it spawns, such as bison.exe. You may want to create system wide environment variables for Bison, especially if you will be performing command line builds that invoke Bison.


    BCB provides a variety of macros that you can enter in the Command Line box. The $SAVE macro tells the IDE to save the file before invoking the external tool. Almost every build tool should list this macro. $NAME contains the filename and extension of the input file, but does not include the path (ie $NAME = parser.flex). $PATH provides the path of the input file, complete with a trailing slash. $TARGETNAME contains the path and filename of the target file.

    Flex and Bison provide a variety of command line options. The -o option determines the output file name for both tools. The -+ Flex option tells Flex to generate a C++ parser that works with C++ iostreams. The -d Bison option forces Bison to generate a header file. This header file will typically be included by the lexer. YOu can pass --help as a command line argument to both tools to see what options are available.

    Tip Note:

    The Build Tool support in BCB6 is a nice addition, but it does have its limits. There is no easy way to change the command line options that are passed to the build tool. If you have a project where you use Flex but not Bison, you will probably want to use the -+ option because it tells Flex to generate a C++ lexer class. The C++ class interfaces with C++ iostreams, and is typically much easier to use. However, if you do need Bison, you probably won't want to create a C++ lexer because Bison doesn't know how to interface with it. In this case, you don't want to pass the -+ option to Flex.

    The tool support in BCB6 does not allow you to override specific options that are passed to the tool. If you need to alter how the IDE passes options to Flex, you will need to edit the Command Line field from the Edit Tool menu.


    Tip Note:

    If you utilize the Build Tools feature of BCB6, you may want to activate the ShowCommandLine compiler setting. With this setting on, BCB will print command strings as hints in the message window. This allows you to see exactly how the IDE invokes Flex and Bison

    To activate the ShowCommandLine setting, run regedit and set HKEY_CURRENT_USER\Software\Borland\C++Builder\6.0\Compiler\ShowCommandLine to true.


    Once Flex and Bison are configured as build tools, you should be able to add Flex and Bison files to your BCB projects. If a project contains a Flex input file, the IDE will invoke flex.exe when you build the project. The source file that Flex creates can be used in two different ways. Your first option is to add the output CPP file to your project just like any other source file. If you follow this strategy, both the Flex input file and the Flex output file will be part of the project. The second option is to #include the Flex generate CPP file from some other CPP that is already in your project. Including a CPP is usually not a good idea, but in this case, it makes sense.

    The examples that accompany this section demonstrate both techniques.

    5.3 Flex and Bison Examples

    5.3.1 A C++ Comment Stripper, Built with Flex

    You can solve many string processing problems with Flex alone. Bison generates code that parses and validates a syntax grammar. However, if you just need to convert an input text file to a different output format, then you probably won't need a full set of grammar rules. In this case, Flex alone may be sufficient.

    Listing 5.1 shows a Flex configuration file for implementing a C++ comment stripper. This lexer processes C++ source code and strips out any comments that it finds. Listing 5.2 contains the C++ code that interfaces to the lexer.

    %{
    // Listing 5.1: flex/comment-stripper/lexer.flex
    // Flex input file for building a program that strips C and C++
    // comments from a source file.
    
    #include <istream>
    #include <ostream>
    using namespace std;
    
    void parse(istream &in, ostream &out);
    %}
    
    %option noyywrap
    
    BEGIN_BLOCK_COMMENT "/*"
    END_BLOCK_COMMENT   "*/"
    
    %x  BlockComment
    %x  SingleLineComment
    
    %%
    
    <INITIAL>"//"  {
        BEGIN(SingleLineComment);
    }
    
    <INITIAL>{BEGIN_BLOCK_COMMENT}  {
        BEGIN(BlockComment);
    }
    
    <INITIAL>.|\n  ECHO;
    
    <SingleLineComment>\n {
        BEGIN(INITIAL);
    }
    
    <SingleLineComment>.  ;
    
    <BlockComment>{END_BLOCK_COMMENT} {
        BEGIN(INITIAL);
    }
    
    <BlockComment>.|\n  ;
    
    %%
    
    void parse(istream &in, ostream &out)
    {
        yyFlexLexer lexer(&in, &out);
        lexer.yylex();
    }
    //------------------------------------------------------------------------------
    //Listing 5.2: flex/main.cpp
    // Flex based C++ comment stripper
    // The comment you are reading should not survive the stripper.
    /* This comment shouldn't
     * survive either */
    #include <fstream>
    #include <iostream>
    using namespace std;
    
    // prototype for the parser function
    void parse(istream &in, ostream &out);
    
    #pragma argsused
    int main(int argc, char* argv[])
    {
    
        cout << "About to invoke the parser." << endl
             << "Note that the lexer is /*probably not*/ perfect!"   << endl
             << "--------------------------------------------------" << endl;
        ifstream sourcefile ("main.cpp");
        parse(sourcefile, cout);
    
        return 0;
    }
    //------------------------------------------------------------------------------

    Flex files contain three distinct sections, which are delimited by two percent signs (%%). The first section is a definition section. It contains regular expression definitions, state definitions, lexer specific directives, and raw C++ code (typically include directives and function prototypes). The middle section is the rules section. It contains lexing rules that tell flex how to process text. The third and last section is called the user subroutines section. It holds any additional C++ code that is needed for the lexer.

    The definition section in Listing 5.1 starts by including the C++ stream header files. It also contains a prototype for a function called parse. Notice that the lexer file delimits raw C++ code with %{ and %}.

    %{
    // Listing 5.1: flex/comment-stripper/lexer.flex
    // Flex input file for building a program that strips C and C++
    // comments from a source file.
    
    #include <istream>
    #include <ostream>
    using namespace std;
    
    void parse(istream &in, ostream &out);
    %}

    The rest of the definition section looks like this:

    %option noyywrap
    
    BEGIN_BLOCK_COMMENT "/*"
    END_BLOCK_COMMENT   "*/"
    
    %x  BlockComment
    %x  SingleLineComment

    noyywrap is a directive that tells Flex not to look for more input after reaching the end of the input stream. BEGIN_BLOCK_COMMENT and END_BLOCK_COMMENT are regular expression definitions. The rules section relies on these definitions. The lines that begin with %x are state definitions. The lexer utilizes states to determine whether it is in the middle of a comment or real code.

    The middle section of the flex file contains lexing rules. The lexing rules look like this:

    <INITIAL>"//"  {
        BEGIN(SingleLineComment);
    }
    
    <INITIAL>{BEGIN_BLOCK_COMMENT}  {
        BEGIN(BlockComment);
    }
    
    
    <INITIAL>.|\n  ECHO;
    
    <SingleLineComment>\n {
        BEGIN(INITIAL);
    }
    
    <SingleLineComment>.  ;
    
    <BlockComment>{END_BLOCK_COMMENT} {
        BEGIN(INITIAL);
    }
    
    <BlockComment>.|\n  ;

    The lexer in this example utilizes a state machine consisting of three states: BlockComment, SingleComment, and INITIAL. INITIAL is the starting state for the lexer. This state is implied and need not be listed. Each rule begins with a state name, listed inside < and > brackets, followed by a regular expression. The remaining text after the regex is C++ code. If the lexer encounters the expression while in a given state, it executes that C++ code.

    For example, the first lexing rule in this example was:

    <INITIAL>"//"  {
        BEGIN(SingleLineComment);
    }

    This rule states the following: if the lexer is in the INITIAL state and it encounters two slash characters, then switch to the SingleLineComment state. Note that BEGIN is a special directive that Flex understands. SingleLineComment was defined as a state using %x in the definition section.

    The second lexing rule looks a little different.

    <INITIAL>{BEGIN_BLOCK_COMMENT}  {
        BEGIN(BlockComment);
    }

    This rule says that if the lexer is in the initial state and it encounters text that matches the regular expression BEGIN_BLOCK_COMMENT, then transition to the BlockCommentState. The curly braces around BEGIN_BLOCK_COMMENT tell Flex that BEGIN_BLOCK_COMMENT is a regular expression that was defined in the definitions section. Without the curly braces, Flex would try to literally match the text BEGIN_BLOCK_COMMENT.

    Once the lexer encounters the beginning of a comment, it executes a state transition to either the SingleLineComment state or the BlockComment state. These two states each contain their own lexing rules. The rules for SingleLineComment look like this:

    <SingleLineComment>\n {
        BEGIN(INITIAL);
    }
    
    <SingleLineComment>.  ;

    The first rule states that if a carriage return is encountered, then return to the INITIAL state. The second rule tells flex to simply swallow all other characters. The . tells Flex to match any input character, and the semicolon all by itself tells flex to simply do nothing with those characters (they won't be copied to the output).

    The last section of the Flex file contains any miscellaneous C++ that is needed by the lexer. In this example, the subroutines section defines the parse function. This routine takes two stream arguments. From those streams, parse constructs a C++ lexer object. The parse function activates the lexer by calling the yylex member function.

    The lexer from example 5.1 works, but it is not quite perfect. What happens if the lexer encounters C++ comment tokens from inside a character literal? For example, how would the lexer handle the following input?

    cout << "The lexer is /*probably not*/ perfect!"   << endl;

    Based on the rules in the Flex file, the lexer would strip out the text /*probably not*/. To be truly useful, the lexing rules need to change to take this into account.

    5.3.2 Evaluating a numerical expression

    Bison is a code generation tool that helps you write a parser that enforces a grammar. A Bison file contains rules that define the structure of the grammar. Bison generates code that accepts tokens from a lexer, and then matches those tokens to the grammar.

    In this section, we will discuss a program that uses Flex and Bison to parse a string that contains a simple numerical expression, such as 3 + 6 * 2 - (2 - 29). Flex handles the job of separating the input text into different tokens. Bison generates code that analyzes those tokens to make sure that they follow the grammar rules for a numerical expression. It also evaluates the expression.

    Listing 5.3 contains a Bison input file called parser.bison. Listing 5.4 contains a Flex lexer file that feeds tokens to the parse. Listing 5.5 contains a small C++ source file that tests the parser.

    Tip Note:
    The calculator example uses a C based lexer instead of a C++ lexer. Before building the calculator project, you will need to remove the -+ command line option from the build tool settings for Flex.


    %{
    //--------------------------------------------------------------------------
    // Listing 5.3: flex/calculator/parser.bison
    // Bison input file for building a program that evaluates a
    // numerical expression.
    
    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    inline void yyerror(const char *c)
    {
        cout << "Error!: "<< c << endl;
        return;
    }
    
    extern char *yytext;
    
    int yylex();
    
    %}
    
    %token  NUMBER
    %left '-' '+'
    %left '*' '/'
    %nonassoc UMINUS
    
    %%
    
    result:  expression  { yylval = $1; }
        ;
    
    
    expression: expression '+' expression   { $$ = $1 + $3; }
        |       expression '-' expression   { $$ = $1 - $3; }
        |       expression '*' expression   { $$ = $1 * $3; }
        |       expression '/' expression   { $$ = $1 / $3; }
        |       '-' expression %prec UMINUS { $$ = -$2;     }
        |       '(' expression ')'          { $$ = $2;      }
        |       NUMBER                      { $$ = $1;      }
        ;
    
    %%
    
    %{
    //--------------------------------------------------------------------------
    %}
    %{
    //--------------------------------------------------------------------------
    // Listing 5.4: flex/calculator/lexer.flex
    // Bison input file for building a program that evaluates a
    // numerical expression.
    
    #include <iostream>
    #include <cstdlib>
    using namespace std;
    
    #include "parser.hpp"
    
    void yy_input(char *, int &count, int max);
    #define YY_INPUT(buffer, count, max) yy_input(buffer,count,max)
    
    %}
    
    %option noyywrap
    
    delim                     [ \t]
    ws                        {delim}+
    letter                    [A-Za-z]
    digit                     [0-9]
    integer                   {digit}+
    float                     {digit}+\.{digit}+
    
    %%
    
    
    {integer} {
        // when we match the regular expression for an integer,
        // convert the string to an integer and return it to
        // the parser by assigning it to yylval.
        yylval = atoi(yytext);
        return NUMBER;
    }
    
    {ws}   {
         // eat whitespace
         ;
    }
    
    [\n]   {
        // when the end of line is encountered
        // return 0 to signal end of input
        return 0;
    }
    
    .      {
        // Return any other character to the parser
        // as a token. This rule handles '+' and '-'. It
        // also handles any invalid character
        return yytext[0];
    }
    
    
    %%
    
    // iter and end form an iterator range of [iter,end). This is the
    // range of characters to process. To parse an expression, these
    // two iterators should be set.
    const char * iter   = 0;
    const char * end    = 0;
    void yy_input(char *buf, int &count, int max)
    {
        if(max > 0)
        {
            const char * end_iter = min(iter+max, end);
            count = end_iter-iter;
            if(count)
            {
                copy(iter, end_iter, buf);
                iter+=count;
            }
        }
    }
    //--------------------------------------------------------------------------
    //------------------------------------------------------------------------------
    //Listing 5.5: flex/main.cpp
    #include <fstream>
    #include <iostream>
    #include <algorithm>
    #include <cstring>
    #pragma hdrstop
    using namespace std;
    
    
    // Just include the cpp files for compilation. The BCB6 build tools don't
    // support the ability to compile the output of flex or bison
    #include "parser.cpp"
    #include "lexer.cpp"
    
    int main()
    {
        const char * input = "3 + 6 * 2 - (2 - 29) \n";
        cout << "Input is : " << input << endl << endl;
    
        iter = input;
        end  = iter + strlen(input);
    
        yyparse();
    
        cout << "result is: " << yylval << endl;
    
        return 0;
    }
    //------------------------------------------------------------------------------

    Like Flex files, Bison files contain three sections: the definition section, the parsing rules section, and a section for writing C++ routines. In Listing 5.3, the key piece of the definition section looks like this:

    %token  NUMBER
    %left '-' '+'
    %left '*' '/'
    %nonassoc UMINUS

    The %token line defines the tokens that the lexer can pass to the parser. Note that single character tokens, such as '+' and '-', don't have to be listed with %token. The %left directive establishes operator precedence. The tokens '+' and '-' have a lower precedence than '*' and '/'. Unary negation (UMINUS) has the highest precedence.

    The grammar rules for the parser reside in the middle section of the Bison file. The rules for the calculator look like this:

    result:  expression  { yylval = $1; }
        ;
    
    
    expression: expression '+' expression   { $$ = $1 + $3; }
        |       expression '-' expression   { $$ = $1 - $3; }
        |       expression '*' expression   { $$ = $1 * $3; }
        |       expression '/' expression   { $$ = $1 / $3; }
        |       '-' expression %prec UMINUS { $$ = -$2;     }
        |       '(' expression ')'          { $$ = $2;      }
        |       NUMBER                      { $$ = $1;      }
        ;

    The first rule states that a result consists of one expression. When a complete expression is encountered, the parser assigns the result to yylval, a variable that is provided by the parser. The second rule defines what an expression is. Notice that an expression is a combination of one or more sub-expressions.

    Each parsing rule consists of a symbol, such as expression '+' expression, followed by a value, such as { $$ = $1 + $3; }. The value consists of C++ code. The tokens that start with '$' are special macros for Bison. Bison replaces $1 with the first token from the symbol. '$2' is the second token, and '$3' is the third token, and so on. In this example, we don't have any rules with more than three tokens. The '$$' macro represents the resulting value of the token.

    Listing 5.4 contains the Flex lexer that feeds the parser. The lexing rules are relatively simple.

    {integer} {
        // when we match the regular expression for an integer,
        // convert the string to an integer and return it to
        // the parser by assigning it to yylval.
        yylval = atoi(yytext);
        return NUMBER;
    }
    
    {ws}   {
         // eat whitespace
         ;
    }
    
    [\n]   {
        // when the end of line is encountered
        // return 0 to signal end of input
        return 0;
    }
    
    .      {
        // Return any other character to the parser
        // as a token. This rule handles '+' and '-'. It
        // also handles any invalid character
        return yytext[0];
    }

    The lexer recognizes tokens and passed them to the parser. The first lexing rule identifies integer values. When a string is found that matches the integer regular expression, the lexer converts the string to an integer and stuffs the value into yylval. It then returns the NUMBER token to the parser.

    The lexer also looks for white space. If it finds a white space character, it simply throws it out because the parser is not interested in white space tokens. If a newline character is found, the lexer returns 0 to signal the end of the current input expression. Any character that does not match any of the other rules is returned directly to the parser as a token.

    In our C++ comment stripper, we added the Flex generated C++ file to our C++Builder project. The calculator program takes a different approach. Instead of adding the generated C++ files to the project, we simply #include them from the main C++ source file (Listing 5.5).

    // Just include the cpp files for compilation. The BCB6 build tools don't
    // support the ability to compile the output of flex or bison
    #include "parser.cpp"
    #include "lexer.cpp"

    This might seem kind of kludgy, but it works, and it is actually easier to use and less quirky in practice than trying to add the generated C++ files to the project.

    5.4 Links to Flex and Bison resources

    Links
  • Flex homepage on the GNU website
  • - http://www.gnu.org/software/flex/flex.html
  • Flex documentation on the GNU website
  • - http://www.gnu.org/manual/flex-2.5.4/flex.html
  • Bison homepage on the GNU website
  • - http://www.gnu.org/software/bison/bison.html
  • Bison documentation on the GNU website
  • - http://www.gnu.org/manual/bison/index.html
  • The Lex and Yacc page
  • - http://dinosaur.compilertools.net/

    Books
  • Lex and Yacc
  • - http://www.amazon.com/exec/obidos/ASIN/1565920007/
  • Compilers: Principles, Tools, and Techniques
  • - http://www.amazon.com/exec/obidos/ASIN/0201100886/

    6: wxWindows


    6.1 Introduction

    wxWindows is a cross platform, C++ library for building GUI applications. It works on a variety of compilers and platforms, including Windows, Linux, and Mac.

    wxWindows is similar to OWL and MFC in its structure. It is a pure C++ library. The class hierarchy includes classes for creating windows, buttons, list boxes, and so on. There is also a class that represents the application as whole. Like OWL and MFC, wxWindows does not provide any form of RAD development.

    6.2 Installation

    wxWindows includes makefiles for building the library and the sample projects with Borland C++Builder. To utilize wxWindows, we need to build the libraries and then configure the IDE include and library paths to point to the wxWindows files. To install wxWindows, follow the steps below:

    1. Download wxWindows from www.wxwindows.org. The current version at the time of this writing is version 2.2.9. The ZIP file can be downloaded from the Source Forge repository at
      http://prdownloads.sourceforge.net/wxwindows/wxMSW-2.2.9.zip?download
    2. Extract the contents of the ZIP file to some directory (e:\code\lib\wxWindows). Allow your ZIP utility to rebuild the directory structure from the archive.
    3. In addition to the files in the ZIP file, you should also download the text files install_msw-2.2.9.txt and readme-2.2.9.txt from http://sourceforge.net/project/showfiles.php?group_id=9863 . Place them root directory where you extracted wxWindows.
    4. Create a system wide environment variable called WXWIN and set it to the root directory where you installed wxWindows (e:\code\lib\wxWindows).
    5. Open a command prompt window and change directories to the $(WXWIN)\src\msw subdirectory.
    6. The makefile for building wxWindows with BCB is makefile.b32. This makefile works with BCB6, except that you need to change one line. The makefile attempts to pass -WE as a command line option to the compiler. This option is not a valid option for bcc32 (it was an option for the 16 bit compiler). Edit $(WXWIN)\src\msw\makefile.b32 and remove the reference to -WE on line 985. Save the makefile after making the change.
    7. Build wxWindows by invoking make.
      make -fmakefile.b32

    That is all you need to do to build the wxWindows libraries. If you want, you can also build the sample programs from the command line. The sample projects are in $(WXWIN)\samples. Each example resides in a separate subdirectory. To build an example, navigate to one of these subdirectories in a console window and type make -fmakefile.b32.

    6.3 Creating wxWindows projects in the IDE

    If you want to do any serious wxWindows work, you will probably want to maintain your projects in the IDE. This makes debugging much easier. To create a new wxWindows app from scratch, follow these steps:

    1. Select File-New-Console Wizard. Check C++, and multithreaded. Leave all of the other check boxes unchecked and click OK.
    2. Save your project somewhere.
    3. It is important that your project use compiler settings that are compatible with the options that were used to build the wxWindows library. Specifically, the -a option and the -b option must match. The wxWindows makefiles use the -a1 and -b settings (byte alignment and integer sized enums). Open your project options and change the project settings to match.
      1. Go to Project-Options.
      2. On the Compiler tab, check the box that says Treat enum types as int.
      3. Go to the Advanced Compiler tab and check Byte alignment.
    4. Go to Project-Options-Directories/Conditionals. Add $(WXWIN)\include to the include path. Add $(WXWIN)\lib to the library path. Set the conditional defines to
      _DEBUG;__WXWIN__;__WXMSW__;__WINDOWS__;WIN32;__WIN95__;
      __BIDE__;INC_OLE2;WXDEBUG=1;__WXDEBUG__;USE_DEFINE.
    5. Add the wxWindows libraries to your project. Select Project-Add to project and navigate to the lib subdirectory of wxWindows. Select xpm.lib, wx32.lib, and any other libraries that you think you might need. Click Ok.
    6. Add wxWindows code to your project.

    The ZIP file for this article contains a hello world wxWindows project that was constructed from these same set of instructions. If you have any difficulties creating a wxWindows project, check your settings against those in the supplied project.

    6.4 wxWizard: an IDE wizard for building wxWindows projects

    Creating a wxWindows IDE project is tedious and error prone. To simplify matters, I have created an IDE plug-in wizard for creating wxWindows projects. The wizard is included with the ZIP file for this article (see the wxWindows\wxWizard directory). Consult the readme.txt file that accompanies the wizard for instructions on how to install and use the wizard.


    7: PCRE


    7.1 Introduction

    PCRE is an open source C library for creating Perl Compatible Regular Expressions. It is a common library that has gained widespread acceptance and has been utilized in many popular projects such as Python, Apache, and PHP. The PCRE home page is http://www.pcre.org. The current version is 3.9.

    C++Builder comes with version 2.01 of PCRE preinstalled. You don't have to do anything special to install the library. The only downside to using the preinstalled version is that BCB does not supply the most current version of PCRE. In most cases, the older version is adequate. If you are interested in upgrading your version of PCRE, send me an email (hhowe@bcbdev.com).

    7.2 PCRE examples

    7.2.1 Raw PCRE calls

    The PCRE library consists of roughly ten C style functions, which are declared in the header file pcre.h (available in $(BCB)\include). Simple pattern matching requires only three of them: pcre_compile, pcre_exec, and pcre_free.

    The pcre_compile routine compiles a regular expression and returns a handle to the compiled expression object. pcre_exec executes a regular expression search. You pass it the compiled expression object, the string to search, and a handful of additional parameters. pcre_free is a function pointer that typically points to the free routine in the RTL.

    Listing 7.1 demonstrates how to match a pattern using the PCRE library.

    //------------------------------------------------------------------------------
    //Listing 7.1: pcre/pcre-test/main.cpp
    #include <cstring>
    #include <algorithm>
    #include <iostream>
    #include <pcre.h>
    
    using namespace std;
    
    // resolve pcre_free to workaround bug in BCB6 RTL
    void  (*pcre_free)(void *) = free;
    
    
    bool Test(const char *str)
    {
        // policy numbers start with MP or CM followed by 1 to 8 digits
        const char *policy_pattern = "^(MP|CM)(\\d{1,8})$";
        const char *errbuf = 0;
        int erroffset = 0;
        int offsets[45];
        int size_offsets = sizeof(offsets)/sizeof(int);
        pcre *regex = 0;
    
        regex = pcre_compile(policy_pattern, 0 , &errbuf, &erroffset, 0);
    
        // Note:
        // In the newest version of PCRE, pcre_exec takes an additional int
        // parameter. This argument is commented out in the call below.
        int result = pcre_exec(regex, 0, str, strlen(str), /*0,*/ 0 ,
                               offsets, size_offsets);
        cout << "regex  = " << policy_pattern << endl
             << "str    = " << str            << endl
             << "result = " << result         << endl;
    
        if(result > 0)
            cout << "regex matched" << endl;
        else if (result == -1)
            cout << "regex did not match" << endl;
        else
            cout << "a travesty occurred, maybe we should look at errbuf?" << endl;
    
        char buf[256];
        for( int j=0; j<result; ++j)
        {
            memset(buf, 0, 256);
            int start = offsets[j*2] ;
            int end   = offsets[j*2 + 1] ;
    
            std::copy(&str[start], &str[end],buf);  // could also use strncpy
    
            cout << "    offset[" << j*2 << "]="<< start
                 << "   :   str[" << start << "]="<< str[start] << endl;
            cout << "    offset[" << j*2+1 << "]="<< end
                 << "   :   str[" << end << "]="<< str[end] << endl;
            cout << "    subpattern[" << j << "] = " << buf << endl<< endl;
        }
    
        cout << endl;
        pcre_free(regex);
        return result > 0;
    }
    
    int main()
    {
        Test("MP001234");      // match
        Test("CM77123");       // match
        Test("xMP12");         // no match, leading x (note use of PCRE_ANCHORED)
        Test("MP12x");         // no match, extra stuff at end)
        Test("MP123456789");   // no match, too many letters (extra stuff at end)
        Test("foobar");
        return 0;
    }
    //------------------------------------------------------------------------------

    The meat of Listing 7.1 resides a routine called Test. The Test function starts by declaring some variables that are needed by PCRE. These include the regular expression string, the regex object, variables for handling errors, and a buffer for storing the offsets of groups that are found during the search.

    Next, the function compiles the regular expression by calling pcre_compile. The code stores the result of pcre_compile in the regex variable. This variable acts as a handle to the compiled expression. The test function passes this handled to pcre_exec.

    pcre_exec performs the search. It returns -1 if the search string did not match the regular expression. If the string does match, pcre_exec returns the number of groups that matched the expression, which is 3 in this case. The regular expression was ^(MP|CM)(\\d{1,8})$. This regex contains 3 groups: one for the entire string, one for the MP|CM part, and one for the trailing (\\d{1,8}). Note that each pair of parenthesis delimits a group.

    When a match is found, pcre_exec fills the offsets array with index values. These values tell you where the match groups reside in the search string. The offset array is a little tricky to deal with. The offsets array contains two entries for each group in the regular expression. One entry contains the index into the string where the group starts. The second entry contains the index for one past the end of the group.

    Because our regular expression has 3 groups, the offsets array will contain 6 values of interest when a match is found. For the test string MP001234, the offsets are:

    // str = "MP001234"
    offset[0]=0   :   str[0]=M
    offset[1]=8   :   str[8]=
    offset[2]=0   :   str[0]=M
    offset[3]=2   :   str[2]=0
    offset[4]=2   :   str[2]=0
    offset[5]=8   :   str[8]=
    

    Notice how offset[0] and offset[1] form a range that spans the entire match string (the range works like iterator ranges in the STL). This range equates to the first match group, which is the entire string. offset[2] and offset[3] form a range that spans the characters 'M' and 'P'. This range corresponds to the subgroup (MP|CM) in the regular expression. The last sub range is formed by offset[4] and offset[5], which contains the indices for the characters that matched the subgroup (\\d{1,8}).

    After performing the regular expression search, the test function frees the compiled expression object by calling pcre_free. A memory leak occurs if the regex object is not freed.

    Tip Note:

    BCB6 may sporadically give you unresolved linker errors for pcre_free and pcre_malloc. If this happens to you, you can solve the problem by resolving these function pointers yourself. Just insert this code into one the C++ files in your project:

    // The import library for the RTL does not resolve pcre_malloc and pcre_free
    // in BCB6. Just go ahead and resolve them here.
    #if defined (__BORLANDC__) && (__BORLANDC__ == 0x560)
    void *(*pcre_malloc)(size_t) = malloc;
    void  (*pcre_free)(void *) = free;
    #endif

    7.2.2 PCRE wrapper class

    The PCRE library is powerful, but its interface is also cumbersome. To simplify searching, I have created a wrapper class that encapsulates the PCRE API calls. Actually, there are two classes. The first class represents a compiled regular expression, and the second represents the results of a search. Listing 7.2 shows the declarations for these wrapper classes. The complete source is available in the archive (pcreobj.cpp). Listing 7.3 shows how to use the wrappers. It performs the same pattern matching from Listing 7.1

    //------------------------------------------------------------------------------
    //Listing 7.2: pcre/pcre-wrapper/pcreobj.h
    #ifndef PCREOBJ_H
    #define PCREOBJ_H
    
    #include <string>
    #include <vector>
    #include <pcre.h>
    
    namespace re
    {
    #define MAX_OFFSETS 255
    
    class TRegExObj;
    class TRegExMatchObj;
    
    // Note:
    //   re::compile, search, and match don't really do anything that you couldn't
    //   do with the classes themselves. They exist to give users a clean way to
    //   perform a search in one line of code. They also make the C++ syntax for the
    //   re module similar to the re module in python. If you don't like these
    //   functions, you can skip them and use the classes and their constructors.
    
    TRegExObj      compile (const std::string &pattern, int flags=0);
    TRegExMatchObj search  (const std::string &pattern,
                            const std::string &search_string, int flags=0);
    TRegExMatchObj match   (const std::string &pattern,
                            const std::string &search_string, int flags=0);
    
    class TRegExObj
    {
    private:
        mutable const char * m_errbuf;
        mutable int          m_erroroffset;
        mutable int          m_offsets[MAX_OFFSETS];
        mutable pcre*        m_pcre;
        std::string  m_pattern;
        int          m_options;
        mutable bool         m_compiled;
    
        void InternalCompile() const;
        void ReleasePattern();
    public:
        TRegExObj(const std::string &pattern,    int options=0);
        TRegExObj(const char        *pattern="", int options=0);
        TRegExObj(const TRegExObj &);
        TRegExObj& operator = (const TRegExObj &);
        ~TRegExObj();
    
        void Compile(const std::string &pattern, int options=0);
        TRegExMatchObj Search(const std::string &str,
                              int options = 0,
                              size_t startpos=0,
                              size_t endpos=std::string::npos) const;
        TRegExMatchObj Match (const std::string &str,
                              int options = 0,
                              size_t startpos=0,
                              size_t endpos=std::string::npos) const;
    
        std::string GetPattern()
        {
            return m_pattern;
        }
    };
    
    class TRegExMatchObj
    {
    private:
        struct TMatchGroup
        {
            int          start;
            int          end;
            std::string  value;
            TMatchGroup(int s=0, int e=0, const std::string& str="")
               :start(s), end(e), value(str)
            {}
        };
    
        std::vector <TMatchGroup> m_Groups;
    
        friend class TRegExObj;
        TRegExMatchObj(int result, int *offsets, const std::string& str);
    
        void BuildMatchVector(int result, int *offsets, const std::string& str);
    public:
        TRegExMatchObj();
        std::string Group   (int GroupIndex = 0);
        size_t      GroupCount();
        int         Start   (int GroupIndex = 0);
        int         End     (int GroupIndex = 0);
        bool        Matched ();
    };
    
    }
    
    #endif
    //------------------------------------------------------------------------------
    //------------------------------------------------------------------------------
    //Listing 7.3: pcre/pcre-wrapper/main.cpp
    #include <cstring>
    #include <algorithm>
    #include <iostream>
    #include "pcreobj.h"
    
    using namespace std;
    using namespace re;
    
    bool Test(const char* str)
    {
        const char *policy_pattern = "(MP|CM)(\\d{1,8})$";
        TRegExObj expression(policy_pattern);
    
        TRegExMatchObj match;
        match = expression.Match(str);
        bool result = match.Matched();
    
        cout << "TRegExObj regex call:" ;
        if (result)
            cout << "match" << endl;
        else
            cout << "not a match" << endl;
    
        if(result)
        {
            int group_count = match.GroupCount ();
            for (size_t j = 0; j<group_count; ++j)
                cout << "match.Group(" << j << ") = " << match.Group(j) << endl;
       }
    
       cout << endl;
       return result;
    }
    
    int main()
    {
        Test("MP001234");      // match
        Test("CM77123");       // match
        Test("xMP12");         // no match, leading x (note use of PCRE_ANCHORED)
        Test("MP12x");         // no match, extra stuff at end)
        Test("MP123456789");   // no match, too many letters (extra stuff at end)
        Test("foobar");
        return 0;
    }
    //------------------------------------------------------------------------------


    Copyright © 1997-2002 by Harold Howe.
    All rights reserved.