DataMonkey

This is beta documentation which means it is a work in progress. Send any questions or comments to Theodore Ward.
DataMonkey is Copyright (C) 2002 AstroComma Incorporated and is freely distributable under the GNU Public License.
For the latest documentation and code, visit the AstroComma website.

  • Introduction
  • About DataMonkey
  • Overview
  • Defining The Input Format
  • A sample problem
  • Creating a new project
  • Describing input records
  • Describing input fields
  • Defining The Output Format
  • Creating the output schema
  • Linking the input fields to the output fields
  • Finishing Up
  • Executing the project

  • Introduction

    About DataMonkey

    DataMonkey is a tool that allows the user to translate data between disparate formats. The original intent was for the Petroleum industry, which is a very data intensive industry and uses a huge number of complex data formats. DataMonkey is not however specific to the petroleum industry.
    Most of the features used to create the data monkey import and export scripts in DataMonkey are accessed by right-clicking on various items in the User Interface tree.

    Overview

    To manipulate data with DataMonkey, you will need to define two files. An import file and an output schema.
    The import file contains an input schema which describes how the input data is formatted and commands that manipulate the data after if has been recognized by DataMonkey.
    The ouptut schema describes the ultimate format of the data.

    Defining The Input Format

    A sample problem

    Suppose we want to import an employee records file that looks like this:
    # Key
    # Record Type A, Last Name, First Name, Phone, Address1, Address2, City, State, Zip
    # Record Type B, Marital Status, Department, Title
    # Record Type C, Days Vacation per year, Days Vacation Available, Salary per year

    RecTypeA, Ward, Carl, 555-555-5555, 10 Oak, Nowhere, OK, 12345
    RecTypeB, Married, Sector G, Nuclear Safety Technician
    RecTypeC, 15, 5, 27000
    RecTypeA, Wilson, Wendi, 555-555-4444, 27 Elm, Springfield, OK, 45321
    RecTypeB, Single, Purchasing, Parts Buyer
    RecTypeC, 25, 16, 42200
    And we want to extract some of this data into a different text file that has the following format:
    Name (Firstname Lastname), Address, Marital Status (M/S/D), Salary

    Creating a new project

    Project Tree First we create a new import project. Click the "File" menu and select "New Project". You can now choose to create a blank project, use the project wizard or use a template. Select "Blank Project". Right click on the import icon and select "properties" you should see the project setup screen. For "Import Data Source" select "ASCII FILE" In the "ASCII File Information" section select the option "Each line may contain different kinds of data" and check the box that says "Generally data items are separated by this character", then in the text box to the right enter a comma ",". This tells DataMonkey to expect to work with a text file, that not all the lines in the text file are exactly the same and the data items we are interested in are separated by commas. Now click the OK button and right click on the "Input" folder in the project tree and select "Add Action". From the list, select Process the children items of this item This tells DataMonkey to begin processing records. This is neccessary in case there are actions you need DataMonkey to perform before and/or after processing the input data. Now click the icon at the top that looks like a floppy disk to save your current progress.

    Describing input records

    New Project Properties Dialog Now we will tell DataMonkey what the different records in the input file look like. There are three different records that have data we are interested in, so we will tell DataMonkey to create three record types for our Input Schema. Expand the "Input" folder in the project tree and right click on the "Records" folder and choose "Add Checkpoint". You can now expand the "Records" folder.

    There is now an item called "New CheckPoint". Click on the "New CheckPoint" text and change the name to "Record Type A". Now right click on the "Record Type A" icon and select "Add Action" in the list you are presented with select "Perform A Comparison" and click the "Create" button. You now will tell DataMonkey how to recognize Record Type A, which is any line whose first comma delimited paramter is "RecTypeA".

    In the dropdown for the left hand side of the equation select "Value from the current input line" this tells DataMonkey that we want to compare a value from a line in the input file. Enter 1 in the "Value For Position" text box. This tells DataMonkey to compare the first comma delimited value off of each line of the input text file. The "Values are delimited" checkbox should be checked and the "Delimiter" text box should contain a comma. DataMonkey remembed these values from when you were setting up the import properties.

    Now we have told DataMonkey what we are comparing (the first item from each line of text) now we need tell DataMonkey what to compare this value to. For the right hand side of the equation, select "Constant Value" this means you are going to type in a value to compare the record type to. In the text box below where it says "Constant Value" enter "RecTypeA". Now you can press the OK button.

    In the project tree, under "RecordTypeA" should be a line that reads "If the value at position 1 is RecTypeA". Now DataMonkey knows how to find Record Type A, but it won't do anything once the record is found. Right click on the "RecordTypeA" record item in the project tree and select "Add Action" in the list select "Process children items of this item". This tells DataMonkey that IF it finds a line of text whose first item is RecTypeA, THEN all the fields we define for RecTypeA should be processed.

    Since we need to tell DataMonkey about Record Type B and Record Type C we can save ourselves a little work by copying and pasting Record Type A then changing it. Right click on "RecordTypeA" under the records folder and select "copy" then right click on the "Records" folder and select paste. Do this one more time. You should now have three items under the "Records" folder named "RecordTypeA". Change the names of the second item to RecordTypeB and the third to RecordTypeC. Then expand each of these nodes and right click on the line that says "If the value at position 1 is RecTypeA" and change the right hand side of the equation to "RecTypeB" and "RecTypeC" respectively.

    You should now save your progress.

    Describing input fields

    Now we need to tell DataMonkey about the fields we are interested in for RecordTypeA, B and C. Right click on the "Fields" folder under "RecordTypeA" and select "Add DataItem" there should now be an item under "Fields" called "New DataItem1" click on the text and change it to "Name". We want to get the first name first so we can append the last name to it. Now right click the field and select "Add Action" and select Insert a value from the current line of input. This tells DataMonkey to get a value from the line of text that was recognized as "RecordTypeA". In the Position box enter 3 since the first name is the 3rd comma delimited item on the line. Make sure the checkbox that says "Overwrite current values contained in this field" is checked. This means the first name will not be "Appended" to previous values for this field. Click the OK button. Now we need to append a space and the LastName. So first right click on the "Name" field again and select "Add Action" and select "Insert a constant value into this field". Now enter a single space character and make sure the checkbox that says "Overwrite existing value" is NOT checked. This means a space will be added to any values already existing for this field. Next, add another action to the "Name" field of the type Insert a value from the current line of input. This time enter 2 in the Position box. This tells DataMonkey to get the 2nd parameter which is the employees last name. Also, make sure the checkbox that says Overwrite current values contained in this field is NOT checked. This will append the last name to the first name and the space.

    You should now be able to create records and fields and add Insert a value from the current line of input actions for the other values (Address, Marital Status, Salary) that we want to export to our new text file.

    When you get to the Marital Status field for RecordTypeB, we add a command get get the second value, but then there is a problem. We want to have either M for married, S for single or D for divorced, but the data in our input file has the words "Married" or "Single". There are two ways we can solve this problem. One way is to set up comparisons like we did when deciding what records to process and then after each comparison create a command to "Insert A Constant Value Into This Field" which would be M, S or D. An easier way though is to create a substitution table command. Right click on the field "Marital Status" and select "Add Action" then select "Get a new value for this field from a conversion table". You should be presented with a grid with two columns "Lookup" and "Convert to". In the first column enter "Married" and in the second column enter "M" then on the next row enter "Single" and "S" and "Divorced" and D. This tells DataMonkey to substitute the values on the right for the values on the left.

    Defining The Output Format

    Creating the output schema

    Before we add an output source to the project tree, we have to create the Output Schema. The output schema is basically a list of records and fields that we want to have when all is said and done. Click on the "Tools" menu and then go into the "Output Schema" submenu and select "Add New..." You should see a new tree. First click on the empty space and give you schema a name such as "Employee Information". Now select "Employee Information" and click on the "Add Item" button. This creates a new "Table" or "Record" entry. Click on the new record and change the name to "My Record". In this example all our output records will look the same, so we won't add any more records. Now select "My Record" and click the "Add Item" button. You have now created an output field. Change the name of this field to "Field". The default options for the field should be fine for this example. Now create fields and name them "Address", "Marital Status" and "Salary". Click the OK button and save your output schema file as "MySchema.sch"

    Now, right click on the "Output" folder in the project tree and select "Add Output Links". You may have to browse for "MySchema.sch" then you should be able to select "Employee Information", do so and click the OK button. You should see the "Output Links" property window. You might want to leave the "Output To" field blank for now so you can more easily test your project. You can always set the output to "ASCII File" later. Click the OK button.

    Linking the input fields to the output fields

    Finally, we tell DataMonkey how to move data from the input source to our output schema. Expand the "Output" folder in the project tree and right click on "Employee Information" and select "Edit Relationships". This part is really easy. click on the fields on the left side and drag them on top of the fields on the right side that they match up with and release. The icons for the input and output schema fields should change colors and a link should appear in the center. If you make a mistake, simply select the incorrect link and click the delete button. Repeat this process for all four of your fields. Now close the window and save your project.

    Finishing Up

    Executing the project

    We are ready to try things out. Click the "!" exclamation point or select "Import" from the "Run" menu. After selecting the file to import, if all goes well you should see output such as this:
    My Record
        Name: Carl Ward, Address: 10 Oak, Nowhere, OK , Marital Status: M, Salary: 27000
        Name: Wendi Wilson, Address: 27 Elm, Springfield, OK , Marital Status: D, Salary: 42200
    This may not be exactly what you were expecting. This is because output to the screen is a debugging feature not a formatting feature. You can now edit the output schema properties and select "ASCII File". The next time you run the import you will be asked to select a file to write the output to. Your output should now appear as you had hoped.

    Note - We are working on adding options to DataMonkey's text output format. Currently DataMonkey outputs text in a database style format, and your output will look like the following:
    My Record
        Name: Carl Ward
        Address: 10 Oak, Nowhere, OK
        Marital Status: M
        Salary: 27000

    My Record
        Name: Wendi Wilson
        Address: 27 Elm, Springfield, OK
        Marital Status: D
        Salary: 42200

    home