Introducing Data Loading

Data loading is the process of copying data from external data sources, such as spreadsheets or SQL databases, into an Essbase database. After you load the data sources into an Essbase database, you can view and analyze the data quickly. This chapter describes the various components involved in loading data, such as rules files, data sources, and free-form data source. This chapter contains the following sections:

Introduction to Data Sources

As illustrated in Figure 286, a data source is composed of records and fields. A record is a row of fields that is read as a unit. A field is a vertical list of values.

Figure 286: Records and Fields

As illustrated in Figure 287, data sources can contain dimension fields, member fields, and data fields. Dimension fields identify the dimensions in the database. Although you can set dimension fields in the data source, usually you define them in the rules file. Member fields identify members of the dimensions in the database. Data fields contain the data that is stored in the database.

Figure 287: Kinds of Fields

How Does Essbase Read a Data Source?

Essbase reads data sources starting at the top and proceeding from left to right. To load a data value successfully, Essbase must encounter one member from each dimension before encountering the data value. For example, in Figure 287, Essbase loads the data value 42 into the database with the members Texas, 100-10, Jan, Sales, and Actual. If Essbase encounters a data value before all members are specified, it stops loading the data source.

The data source can contain only dimension names, member names, alias names or data values; it cannot contain miscellaneous text. Not only must the data source contain enough information, the information must be in an order Essbase understands. Data sources, therefore, must be complete and correctly formatted.

Before you load data or build dimensions, you must format your data source so that it maps to the multidimensional database you are loading it into. You can format your data source in the following ways:

When Essbase loads data from external sources:

  1. Essbase reads the external data source. You must format the external data source carefully.
  2. If you are using a rules file, Essbase transforms the data to match the Essbase database during loading without changing the original data source. You must use a rules file if:
  3. Essbase stores the data in the multidimensional database.

If you are loading data into a transparent partition, follow the same steps as for loading data into a local database.

Valid Data Fields

A data field is a specific kind of field in a record. Data fields contain the data for their intersection in the database. In Figure 287, for example, 42 is a data field. It is the dollar sales of 100-10 (Cola) sold in Texas in January.

Essbase accepts only the following kinds of data fields:

Data Field Types
Examples

Numbers and their modifiers with no spaces or separators between them:

 

  • Numbers (0-9)

12

  • Dollar sign ($)

$ 12 is not a valid value because of the space between the dollar sign and the 12. $12 is a valid value.

  • Euro currency symbol

12

  • Numbers in parentheses (to indicate a negative number)

(12)

  • Minus sign before numbers. Minus signs after numbers are not valid.

-12

  • Decimal point

12.3

Large numbers with or without commas

Both 1,345,218 and 1345218 are valid values.

#MI or #MISSING to represent missing or unknown values

You must insert #MI or #MISSING into a data field that has no value. If you don't, the data source may not load correctly. To replace a blank field with #MI or #MISSING, see Replacing an Empty Field with Text.



Valid Member Fields

A member field contains the name of a member or alias in a dimension. In Figure 287, for example, Texas and Ohio are members of the Market dimension. Member fields must be formatted as follows:

Data Field Types
Examples

Member fields must map to member names or aliases in the database.

A member field called Texas maps to the Texas member in the Sample Basic database. A member field called Salesperson does not map to any member in the Sample Basic database and is, therefore, invalid.

You can only load data into members that are pre-calculated, that is, you cannot load data into Dynamic Calc or Dynamic Calc And Store members.

If Year is a Dynamic Calc member, you cannot load data into it. Instead, load data into Qtr1, Qtr2, Qtr3, and Qtr4, which are not Dynamic Calc members.

A member name must be enclosed in quotes if it contains any of the following:

  • White space
  • Numeric characters (0-9)
  • Dashes (minus signs, hyphens)
  • Plus signs
  • & (ampersands)

For files that free form load into the Sample Basic database, the 100-10 product member name must be in quotes: "100-10"

For rules on naming dimensions and members, see Rules for Naming Dimensions and Members.

If a member field maps to an alias, Essbase uses the current alias table.

Default is the name of the default alias table.



Invalid Member or Data Fields

When Essbase encounters an invalid member or data field, it stops the data load. Essbase loads any fields read before the invalid field into the database, resulting in a partial load of the data.

In the following file, for example, Essbase stops the data load when it encounters the 15- data value. Essbase loads the Jan and Feb Sales records, but not the Mar and Apr Sales records.

Figure 288: Invalid Data Field

East Cola   Actual
Sales       Jan     $10
            Feb     $21
            Mar     $15-
            Apr     $16 

For information on continuing the load, see Loading Dimension Build or Data Load Error Logs.

Setting File Delimiters

You must separate fields from each other with delimiters. Delimiters can be any combination of the following:

Note: You cannot use commas as delimiters in free-form data sources, although you can use them in data sources you are loading using a rules file.

The delimiter you use can vary between fields. Essbase ignores excess delimiters in free-form data sources.

In Figure 289, for example, the fields are separated by spaces. Essbase ignores the extra spaces between East and Cola in the first record.

Figure 289: File Delimiters

East    Cola    Actual    Jan    Sales    10
East    Cola    Actual    Feb    Sales    21
East    Cola    Actual    Mar    Sales    30 

For more information, see Setting File Delimiters.

Ignored Characters

Some characters are in the data source for formatting reasons only. For that reason, Essbase ignores the following characters:

==

Two or more equal signs, such as for double underlining

- -

Two or more minus signs, such as for single underlining

_ _

Two or more underscores

==

Two or more IBM PC graphic double underlines (ASCII character 205)

_ _

Two or more IBM PC graphic single underlines (ASCII character 196)



Ignored fields do not affect the data load.

For example, Essbase ignores the equal signs in Figure 290, but loads the other fields normally.

Figure 290: Ignoring Formatting Characters During Loading

East Actual "100-10"
        Sales    Marketing
        =====    =========
Jan     10       8
Feb     21       16 

Introduction to Rules Files

Data load rules are a set of operations that Essbase performs on data when it loads the associated data source into the database, such as rejecting invalid records in the data source. Data sources are external sources of data such as spreadsheet files, text files, or SQL data sources. Applying data load rules to data sources makes it possible to map external data values to an Essbase database during loading.

Figure 291: Loading Data Sources through Rules Files

Data load rules are stored in rules files. Essbase loads the data in the data source into the database through the rules file without changing the data source. You can re-use a rules file with any data source that requires the same set of data loading rules.

You also use rules files in dimension build operations to add or change members and dimensions in outlines. For information about creating rules files and defining them for dimension build operations, see Introducing Dynamic Dimension Building and Building Dimensions Using a Rules File.

When to Use Data Load Rules

Use data load rules when the data load should:

See Setting up a Rules File to Manipulate Records, and Manipulating Fields Using a Rules File, for information about manipulating fields and records. See Introducing Dynamic Dimension Building, for information about changing or adding members and dimensions.

How to Create Data Load Rules

You create data load rules using the following process:

  1. Select the data source in the Data Prep Editor. For example, you can select an SQL data source, a spreadsheet, or a text file. See Selecting the Data Source.
  2. Set the file delimiter for your data source. For example, you can select tabs, commas, or spaces. See Setting File Delimiters.
  3. Perform operations on records. For example, you can set up header records, and define select and reject operations. See Setting up a Rules File to Manipulate Records.
  4. Perform operations on fields. For example, you can split them, join them, and create new ones. See Manipulating Fields Using a Rules File.
  5. Map fields in the data source to dimensions and members in the database. See Mapping Fields to Member Names.
  6. Save and validate the data load rules. For more information, see Validating and Saving Data Load Rules.

How Does Essbase Execute Operations in a Rules File?

When Essbase loads data using a rules file, it executes the operations in the rules file in the following order:

  1. Essbase sets all file delimiters, including fixed-width columns. For more information, see Setting File Delimiters.
  2. Essbase performs all field operations in the order they are defined in the rules file. Field operations alter the position or number of fields and include moves, splits, joins, create using text, and create using join operations. For more information, see Ordering Fields.
  3. If you're not sure in what order the field operations were defined, select Options > Data File Properties and click the Field Edits tab. The Data File Properties dialog box appears, listing all the field operations.

    The rules file in Figure 292, for example, contains move, split, and join operations.

    Figure 292: Field Operations

  4. Essbase applies all properties for each field, applying all of the properties to field1 before proceeding to field2. Essbase applies field properties in the following order:
    1. Ignores fields set to ignore during data load.
    2. Ignores fields set to ignore during outline update.
    3. Flags the data field.
    4. Applies field names.
    5. Applies field generations.
    6. Performs all replaces in the order they are defined in the rules file. If you're not sure what order the replace operations are in, select Field > Properties and click the Global Properties tab. The Field Properties dialog box appears, listing all replace operations.
    7. Drops leading and trailing white spaces.
    8. Coverts spaces to underscores.
    9. Applies suffix and prefix operations.
    10. Scales data values.
    11. Converts text to lowercase.
    12. Converts text to uppercase.
    13. For more information, see Manipulating Fields Using a Rules File.

  5. If you choose to skip lines, Essbase skips the number of lines that you specified, otherwise Essbase proceeds to the first record. For more information, see Defining Header Information in the Rules File.
  6. Essbase performs selection or rejection criteria in the order that they are defined in the rules file. Essbase loads or doesn't load individual records in the data source based on these criteria specified in the rules file.
  7. If you're not sure in what order the selection or rejection criteria are defined, select Record > Select or Record > Reject. The Select Record or Reject Record dialog box displays, listing all the selection or rejection operations.

    The rules file in Figure 293, for example, contains a selection criterion.

    Figure 293: Selecting Records

Rules for Rules File Data Sources

This section describes rules you must follow when formatting data sources that are loaded using rules files.

Rules for Dimension Fields

Essbase must be able to identify each dimension in the database using information in the data source or the rules file. The field values in a dimension field must contain members for that dimension. For example, a field defined as Year has members such as Jan, Feb, and Mar. A field defined as Product has members such as Cola and Root Beer.

If the data source does not identify each dimension in the database, you must identify the missing dimensions in a header record. For example, the Sample Basic database has a dimension for Year. If several data sources arrive with monthly numbers from different regions, the month itself might not be specified in the data sources. You must set header information to specify the month. For information on setting header records, see Using Header Information.

If a member value is missing for a dimension field, the value from the last valid record is used. For example, when you load Figure 294 to the Sample Basic database, Essbase maps the Ohio member field into the Market dimension for all records, including the records that have Root Beer and Diet Cola in the Product dimension.

Figure 294: Valid Missing Members

Jan, Sales, Actual
Ohio       Cola          25
           "Root Beer"   50
           "Diet Cola"   19 

Essbase stops the data load if no prior records contain a value for the missing member field. If you tried to load Figure 295 into the Sample Basic database, for example, the data load would stop while trying to process the first record, because the Market dimension (Ohio, in Figure 294) is not specified.

Figure 295: Invalid Missing Members

Jan, Sales, Actual
           Cola          25
           "Root Beer"   50
           "Diet Cola"   19 

For information on restarting the load, see Loading Dimension Build or Data Load Error Logs.

Rules for Member Fields

After Essbase identifies all dimensions, it maps the member fields into the appropriate members in the outline. A member field can map to a single member name, such as Jan (which is a member of the Year dimension), or a member combination, such as Jan, Actual (which are members of the Year and Scenario dimensions).

If the data source contains member fields, the data source or rules file must identify the dimensions they map to in the database. For example, the following file contains member fields for Jan, Cola, East, Sales, and Actual. The rules file must identify the dimensions that those members map to, in this case Year, Product, Market, Measures, and Scenario.

Figure 296: All Dimensions Specified

Jan    Cola    East    Sales    Actual    100
Feb    Cola    East    Sales    Actual    200 

Quoting Member Names

You must use double quotes around member names that contain the same character as the file delimiter. File delimiters are the character(s) that separate fields in the data file.

Note: You do not have to double quote any member names that come from SQL data sources, because the fields in SQL data sources are not delimited by characters.

For example, if your data source is delimited by spaces, use quotes around member names with embedded spaces. Figure 297, for example, quotes New York, because it has a space in it:

Figure 297: Quoted Member Names

Cola    Jan    "New York"   Actual    Sales   50
Cola    Jan    Ohio         Actual    Sales   78 

Unknown Member Names

If a member field contains an unknown member name, Essbase rejects the entire record during the data load. If there was a prior record with a member name for the missing data load field value, Essbase continues to the next record. If there are no prior records, the data load stops.

For example, when you load Figure 298 into the Sample Basic database, Essbase rejects the record containing Ginger Ale because it is not a valid member name. Essbase loads the records containing Cola, Root Beer, and Cream Soda. If Ginger Ale were in the first record, however, the data load would stop.

Figure 298: Unknown Members

Jan, Sales, Actual
Ohio    Cola          2
        "Root Beer"   12
        "Ginger Ale"  15
        "Cream Soda"  11 

Note: Instead of rejecting the record, you can add the new members encountered to the database using the dimension build feature. See Introducing Dynamic Dimension Building.

For information on restarting the load, see Loading Dimension Build or Data Load Error Logs.

Rules for Data Fields

After Essbase identifies all dimensions and maps the member fields into the appropriate members in the outline, it loads the data fields to the Essbase database. The data source or rules file must contain enough information for Essbase to determine where to put the data. To Essbase, data are the numbers stored for each intersection in the database. In Figure 287, for example, 42 is the data stored in the database as the actual quantity of 100-10 (Cola) sold in Texas in Jan (January).

If the data source contains a member field for every dimension and only one data column, you must set the data column as a data field. To read Figure 299 into the Sample Basic database, for example, identify the last column as a data field.

Figure 299: Setting Data Fields

Jan    Cola    East    Sales    Actual    100
Feb    Cola    East    Sales    Actual    200 

To identify a column as a data field, see Defining a Column as a Data Field.

Assigning All Members

The field name you assign to a data field must be a dimension, a member, or a member combination from the database. For example, the data field in the following file specifies each member so Essbase knows where to put the data.

Figure 300: Assigning Data Fields

             Jan, Actual
Cola         East    Sales   100
"Root Beer"  East    Sales   200 

The only exception to this rule is a data source where each record contains a data load field for every dimension and one data column, such as Figure 299, where each record specifies each dimension (for example, Jan, Cola, East, Sales, and Actual) and the final column is a data field (for example, 100).

Empty Data Fields

If there is no value in the data field (or the value is #MISSING), Essbase does not change the existing data value in the database. Essbase won't replace current values with empty values.

Note: If the data source contains blank fields for data values, replace them with #MI or #MISSING. Otherwise, the data may not load correctly. To replace a blank field with #MI or #MISSING, see Replacing an Empty Field with Text.

Rules for Extra Fields

If the data source contains fields that you don't want to load into the database, you can tell Essbase to ignore those fields. For example, the Sample Basic database has five standard dimensions into which you would load data: Year, Product, Market, Measures, and Scenario. If the data source had an extra field, such as Salesperson, that isn't a member of any dimension, tell Essbase to ignore the Salesperson field during the data load.

No Blank Fields

If a rules file has blank fields, the data source won't load. So, for example, if your rules file has extra fields at the end, it won't work. Join the empty fields with the field next to them.

For more information, see Joining Fields.

Each Record Must Have the Same Number of Fields

Each record must have the same number of fields. If fields are missing, the data loads incorrectly. For example, the file in Figure 301, is invalid, because there is no value under Apr. To fix the file, insert #MISSING or #MI into the missing field.

Figure 301: Missing Fields

Actual Ohio Sales Cola
Jan     Feb    Mar    Apr
10      15     20 

Figure 302 is valid because #MI was inserted to replace the missing field.

Figure 302: Valid Missing Fields

Actual Ohio Sales Cola
Jan     Feb    Mar    Apr
10      15     20     #MI 

Rules for File Delimiters

A data source cannot have extra file delimiters if you are using a rules file. The rules file reads the extra delimiters as empty fields. For example, if you tried to load the file in Figure 303 into the Sample Basic database using a rules file, it would fail. Essbase reads the extra comma between East and Cola in the first record as an extra field. Essbase then puts Cola into Field 3. In the next record, however, Cola is in Field 2. Essbase expects Cola to be in Field 3 and stops the data load.

Note: You cannot use commas as delimiters in free-form data sources, although you can use them in data sources you are loading using a rules file.

Figure 303: File Delimiters

East,,Cola,Actual,Jan,Sales,10
East,Cola,Actual,Feb,Sales,21
East,Cola,Actual,Mar,Sales,30 

To solve the problem, delete the extra delimiter from the data source.

Rules for Free-Form Data Sources

If a data source contains enough information to load into the database, you can load the data source directly. This kind of load is called a free-form data load.

This section describes how free-form data sources must be formatted. If your data source is not correctly formatted, it will not load. You can edit your data source directly to fix the problem. If you find that you must perform many edits (such as moving several fields and records), it might be easier to load the data source using a rules file. See Introduction to Rules Files.

Note: If the data source contains blank fields for data values, replace them with #MI or #MISSING. Otherwise, the data may not load correctly. To replace a blank field with #MI or #MISSING using a rules file, see Replacing an Empty Field with Text.

As a free-form data source, you can use a file previously created through the Application Manager's export feature. Such a file is already formatted properly.

Use the LOADDATA command in ESSCMD to load data free form. See the Technical Reference in the docs directory for information about this command. See Automating the Production Environment for information about ESSCMD.

Formatting Ranges of Member Fields

You can express member names as ranges within a dimension. For example, Sales and Profit form a range in the Measures dimension. Ranges of member names can handle a series of consecutive values.

A data source can contain ranges from more than one dimension at a time.

In Figure 304, for example, Jan and Feb form a range in the Year dimension and Sales and Profit form a range in the Measures dimension.

Figure 304: Multiple Ranges of Member Names

Texas             Sales       Profit
                  Jan   Feb   Jan   Feb
Actual  "100-10"  98    89    26    19
        "100-20"  87    78    23    32 

In Figure 304, Sales is defined for the first two columns and Profit for the last two.

Ranges Set Automatically

When Essbase encounters two or more members from the same dimension with no intervening data fields, it sets up a range for that dimension. The range stays in effect until Essbase encounters another member name from the same dimension, at which point Essbase replaces the range with the new member or new member range.

Figure 305, for example, contains a range of Jan to Feb in the Year dimension. It remains in effect until Essbase encounters another member name, such as Mar. If Essbase encounters Mar, the range changes to Jan, Feb, Mar.

Figure 305: Ranges of Member Names

Texas Sales
                  Jan   Feb   Mar
Actual  "100-10"  98    89    58
        "100-20"  87    78    115 

Data Values Out of Range

When Essbase encounters a member range, it assumes that there is a corresponding range of data values. If the data values are not in the member range, the data load stops. Essbase loads any data fields read before the invalid field into the database, resulting in a partial load of the data.

Figure 306, for example, contains more data fields than the defined range of members. The data load stops when it reaches the 10 data field. Essbase loads the 100 and 120 data fields into the database.

Figure 306: Extra Data Values

Cola Actual East
         Jan    Feb
Sales    100    120    10
COGS     30     34     32 

For information on restarting the load, see Loading Dimension Build or Data Load Error Logs.

Duplicate Members in a Range

Be sure to structure ranges in the source data so that Essbase interprets them correctly. If the same member appears more than once in a range, Essbase ignores the duplicate members.

The file in Figure 307 contains two ranges: Actual to Budget and Sales to COGS. It also contains duplicate members.

Figure 307: Duplicate Members in a Range

Cola East
        Actual    Budget    Actual    Budget
        Sales     Sales     COGS      COGS
Jan     108       110       49        50
Feb     102       120       57        60 

Essbase ignores the duplicate members. The members that Essbase ignores have a line through them in the following example:

Figure 308: Ignored Duplicate Members

Cola East
        Actual    Budget    Actual    Budget
        Sales     Sales     COGS      COGS
Jan     108       110       49        50
Feb     102       120       57        60 

For Actual, the first member in the first range, Essbase maps data values to each member in the second range. Essbase then proceeds to the next value in the first range, Budget, similarly mapping values to each member in the second range. As a result, Essbase Essbase interprets the file as shown in Figure 309.

Figure 309: How Essbase Interprets the File in Figure 307

Cola East
           Actual            Budget
           Sales     COGS    Sales    COGS
Jan        108       110     49       50
Feb        102       120     57       60 

How Essbase Reads Multiple Ranges

As Essbase scans a file, it processes the most recently encountered range first when identifying a range of data values. In Figure 309, for example, there are two ranges: Actual and Budget and Sales and COGS. While reading the file from left to right and top to bottom, Essbase encounters the Actual and Budget range first and the Sales and COGS range last. Because the Sales and COGS range is encountered last, Essbase puts data fields in that part of the database first.

Formatting Columns

Files can contain columns of fields. Columns can be symmetric or asymmetric. Symmetric columns have the same number of members under them. Asymmetric columns have different numbers of members under them. Essbase supports loading data from both types of columns.

Symmetric Columns

Symmetric columns have the same number of members under them. In Figure 310, for example, each dimension column has one column of members under it. For example, Product has 100-10 under it.

Figure 310: Symmetric Columns

Product     Measures   Market    Year    Scenario    *data*
"100-10"    Sales      Texas     Jan     Actual      112
"100-10"    Sales      Ohio      Jan     Actual      145 

The columns in the following file are also symmetric, because Jan and Feb have the same number of members under them:

Figure 311: Groups of Symmetric Columns

                              Jan           Feb
                        Actual  Budget  Actual  Budget
"100-10"  Sales  Texas  112     110     243     215
"100-10"  Sales  Ohio   145     120     81      102 

Asymmetric Columns

Columns can also be asymmetric. In Figure 312, the Jan and Feb columns are asymmetric because Jan has two columns under it (Actual and Budget) and Feb has only one column under it (Budget):

Figure 312: Valid Groups of Asymmetric Columns

                 Jan     Jan     Feb
                 Actual  Budget  Budget
"100-10"  Sales  Texas   112     110      243
"100-10"  Sales  Ohio    145     120      81 

If a file contains more than one asymmetric group of member columns, you must label each column with the appropriate member name.

The file in Figure 313, for example, is not valid because the column labels are incomplete. The Jan label must appear over both the Actual and Budget columns.

Figure 313: Invalid Asymmetric Columns

                         Jan             Feb
                         Actual  Budget  Budget
"100-10"  Sales  Texas   112     110     243
"100-10"  Sales  Ohio    145     120     81 

This file in Figure 314 is valid because the Jan label is now over both Actual and Budget. It is clear to Essbase that both of those columns map to Jan.

Figure 314: Valid Asymmetric Columns

                         Jan     Jan     Feb
                         Actual  Budget  Budget
"100-10"  Sales  Texas   112     110      243
"100-10"  Sales  Ohio    145     120      81 

Security and Multi-User Considerations

Essbase supports concurrent multiple users reading and updating the database. This means that users can use the database while you are dynamically building dimensions, loading data, or calculating the database. In a multiple-user environment, Essbase protects your data using the security system described in Managing Security for Users and Applications.




© 2002 Hyperion Solutions Corporation. All rights reserved.
http://www.hyperion.com