CS 351: Design of Large Programs -------------------------------- Assignment # 4, Due Date 11/10/01 ---------------------------------- The goal of this assignment is to extract information from an XML document and save it on disk. The XML document contains customer information and purchase orders accumulated over a period of about a week. Each week a new document containing fresh orders will be received and new information needs to be extracted and added to the already saved purchase orders on disk. Extra Credit Part only: In addition, it should be possible to query the database for information on sales of specific items and/or customer purchases over a period (such as the first quarter of 2001, etc.) Background: ----------- There are at least two types of parsers used to parse XML documents. The first is based on the Document Object Model (DOM). This requires that the entire document be loaded into memory before it can be used. Another commonly used type of parser is the SAX (Simple API for XML) parser that you will use for this assignment. A SAX parser reads the document in stages and issues "callbacks" to user-written code as different conditions occur such as the start of an element, or the occurence of a string of characters between the start and end tags of an element. Assignment Specifics: --------------------- First, follow the instructions in Denys' handout related to installing the libxml2 library on your system. Then, make sure you can run the sample program handed out in class. The program below (a modification of the program handed out in class) (1) defines five callback functions (start/end of document, start/end of element, string occurence between start and end tags of an element). (2) creates an instance of the callback structure and initializes the callback function pointers. (3) creates a state structure which saves state information between callbacks. (4) creates a parser and passes it pointers to the callback structure and to the state structure (5) starts the parser and (6) performs clean-up operations after the parser terminates. Study the program below since you will need to use most of the statements therein for this assignment. #include #include #include #include #include typedef enum { parse_start_s=0, parse_finish_s, parse_item_id_s, parse_quantity_s, parse_price_s, parse_discount_s, parse_other_s} Parse_state; typedef struct { Parse_state pstate; Parse_state discount; } State_struc; static xmlSAXHandler callbacks; static void start_document(void *ctx) { State_struc *state_ptr; state_ptr = (State_struc *) ctx; state_ptr->pstate = parse_start_s; state_ptr->discount = parse_start_s; printf("Document started!!!\n"); } static void end_document(void *ctx) { printf("Document ended!!!\n"); } static void start_element( void *ctx, const xmlChar *name, const xmlChar **attrs) { State_struc *state_ptr; state_ptr = (State_struc *) ctx; if( strcmp((const char *)name, "item_id")==0 ) state_ptr->pstate = parse_item_id_s; else if( strcmp((const char *)name, "price")==0 ) state_ptr->pstate = parse_price_s; else if( strcmp((const char *)name, "quantity")==0 ) state_ptr->pstate = parse_quantity_s; else if( strcmp((const char *)name, "discount")==0 ) { state_ptr->pstate = parse_discount_s; state_ptr->discount = parse_discount_s; } else state_ptr->pstate = parse_other_s; } static void end_element( void *ctx, const xmlChar *name) { printf("Element %s ended\n", name); } static void chars_found( void *ctx, const char *chars, int len ) { char buff[len+1]; State_struc *state_ptr; state_ptr = (State_struc *) ctx; strncpy(buff, chars, len); buff[len] = '\0'; switch( state_ptr->pstate) { case parse_item_id_s : printf("\t i_id=%s", buff); state_ptr->pstate = parse_other_s; break; case parse_discount_s : printf("\t disc=%s", buff); state_ptr->pstate = parse_other_s; break; case parse_price_s : if ((state_ptr->discount) != parse_discount_s) printf("\t\t"); state_ptr->discount = parse_other_s; printf("\t pr=%s", buff); state_ptr->pstate = parse_other_s; break; case parse_quantity_s : printf("\t Quan=%s\n", buff); state_ptr->pstate = parse_other_s; break; case parse_other_s : break; default: break; } } int main() { xmlParserCtxtPtr ctxt_ptr; State_struc state; memset( &callbacks, sizeof(xmlSAXHandler), 0); callbacks.startDocument= start_document; callbacks.endDocument= end_document; callbacks.startElement= start_element; callbacks.endElement= end_element; callbacks.characters=(void (*)(void *, const xmlChar *, int)) chars_found; ctxt_ptr= xmlCreateFileParserCtxt("sales.xml"); if( !ctxt_ptr ) { printf("Failed to create file parser !!!\n"); return -1; } ctxt_ptr->sax= &callbacks; ctxt_ptr->userData= &state; xmlParseDocument(ctxt_ptr); if( !ctxt_ptr->wellFormed ) { printf("Document is not well formed!!!\n"); } ctxt_ptr->sax= NULL; xmlFreeParserCtxt(ctxt_ptr); printf("Parsing complete!!! \n"); return -1; } ****************************************************************** The POs that comprise the document your program will receive are defined by the DTD (Document Type Definition) below. ]> ****************************************************************** A sample XML document of POs is shown below. 10/22/2001 John
1010 Main Albuquerque 87131 1234567 2347654
shoes 5.00 1 5.00 sugar 15 2.00 2 4.00 9.00 10.00 1.00
10/24/2001 Mary
2643 Bronze New York 88131 2345678 5678234 4765411
ink 1.00 5 5.00 5.00 VISA Mary 1234567890 07/03
10/24/2001 Steve
7432 Silver Greenbelt 89131 2345678 4320659
tooth paste 2.50 2 5.00 tooth brush 1.00 2 2.00 7.00 10.00 3.00
10/25/2001 John
5103 Gold Round Rock 90131 1234567
TV set 10 150.00 1 150.00 150.00 150.00 0.00
****************************************************************** What should your program do? ---------------------------- Your program should support the folowing interaction with the user. In what follows, user responses are underlined. Enter name of your (local) file: menezes ------- Use your name since we will need to inspect your file later and we should know which group created which file. (Last names in CS 351 are unique, so it's best you use that for the file that stores all the information extracted from the XML files processed.) The response of your program should be either menezes does not exist OR Loading menezes Enter xml filename: xyz.xml ------- Parsing of xyz.xml should then commence and your program should store the PO information in container(s) designed by you. When done, the following should be displayed: xyz.xml parsing over Then the program should prompt the user for a command and the expected response is one of the following four: Enter Command: d - OR Enter Command: d* -- OR Enter Command: d ----------------- OR Enter Command: s - The first command should cause the program to display each PO extracted from xyz.xml in the format described below. The second command should cause the program to display each PO in menezes followed by each PO in xyz.xml. The third command should cause the program to display each PO extracted from xyz.xml that has been placed between and . The fourth command should append the orders extracted from xyz.xml to menezes and save menezes on disk if menezes already existed. If menezes does not yet exist, it should be created and the new POs extracted from xyz.xml should be saved in it. The format for display of each PO is: Date:______ Order #: _________ Customer: _____________________ Address: ______________________________________________________________ Phone: __________ Fax: ____________ Mobile: ___________ Item# ItemDesc Disc Price Quantity SubTotal ----- -------- ---- ----- -------- -------- ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Total: ___________________________ Payment: __________________________________________________ This interaction can be repeated without limit, each time adding fresh POs to file menezes. Note: (1) Customers, POs and items are each uniquely identified by their respective id's. (2) Your implementation should store information efficiently. This means that you may wish to use two containers -- one to store customer information and the other to store PO information. (3) Customer details (address, phone numbers, etc.) may change over time. For the extra-credit part, which involves queries related to, among other things, customer information, the latest information should be displayed. (4) Design your display function so that it appears one screen-full at a time.