QlikView

Thursday, 26 February 2015

Cross Table in QlikView

Cross Table is a table with Orthogonal list of headers. Orthogonal simply means at right angles.

We come across cross tables everywhere in our daily lives.

Michael Jordan is the greatest basketball player of our times. I am a big fan of Michael Jordan. To explain the concept of Cross table, i will take a sample of Michael Jordan’s regular season statistics.

Michael Jordan Stats – Source espn.go.com

This is a cross table because it contains 3 Orthogonal lists of headers – Season,Team and Categories.

if this is loaded in QlikView in a normal fashion, it will create separate list for each of the columns. After the data load, it will look like

If the Cross table is very large with huge number of Orthogonal headers then it will make the application very heavy because each of the column in the cross table will be loaded separately.

If you view this table in the table viewer, it will look like

In QlikView you have to load a CrossTable by using CrossTable transformation (prefix load statement with CrossTable syntax). It will then create one column for Season,One for Team ,One for each of the Category and One for Data.

Loading Data by using CrossTable transformation -

To load a crosstable, start with similar steps as loading Table file/excel file, click on “Table files”

Click on “Enable Transformation step” to apply CrossTable transformation

Click on “CrossTable” under file parameters

Select the Qualifier field/s, Attribute field and Data field.

Qualifier fields are the columns to the left. You can decide which columns you want to treat as qualifiers.

Loaded Crosstable will look like the following , which is one column for Season,One for Team ,One for each of the Category and One for Data.

CrossTable Load script will look like the following. Number 2 in Crosstable syntax signifies the 2 Qualifier fields Season and Team.

After the data load, If you see the Table viewer , it will look like

Fields will be loaded in the following fashion

Please watch my video for step-by-step method of loading Cross table

Wednesday, 25 February 2015

Incremental Load example

In my previous article, we discussed “How to use QVDs to make your QlikView application more efficient?”. In this article, we will go one step ahead to make our application more efficient while dealing with large transactional data. As discussed in my previous article I was working on a QlikView application, where I had to show the sales across various channels for pre-defined frequencies (e.g. Daily, Monthly, Yearly).

Initially, I was reloading the entire transactions table on a daily basis even though I already had the data till yesterday with me. This not only took significant time, but also increased the load on the database server and the network.This is where incremental load with QVDs made a huge difference by loading only new or updated data from the database into a table.

Incremental loads:

Incremental load is defined as the activity of loading only new or updated records from the database into an established QVD. Incremental loads are useful because they run very efficiently when compared to full loads, particularly so for large data sets.

Incremental Load, QlikView, QVD, Optimize Qlikview

Incremental load can be implemented in different ways, the common methods are as follows:

Insert Only (Do not validate for duplicated records)
Insert and Update
Insert, Update and Delete

Let us understand each of these 3 scenarios with an example

1. Insert Only:

Let us say, we have sales raw data (in Excel) and whenever a new sales get registered, it is updated with basic details about the sale by modified date. Since, we are working on QVDs, we already have QVD created till yesterday (25-Aug-14 in this case). Now, I want to load only the incremental records (Highlighted in yellow below).

To perform this exercise, first create a QVD for data till 25-Aug-14. To identify new incremental records, we need to know the date till which, QVD is already updated. This can be identified by checking the maximum of Modified_date in available QVD file.

As mentioned before, I have assumed that “Sales. qvd” is updated with data till 25-Aug-14. In order to identify the last modified date of “Sales. qvd”, following code can help:

Here, I have loaded the last updated QVD into the memory and then identifed the last modified date by storing maximum of “Modified_Date”. Next we store this date in a variable “Last_Updated_Date” and drop the table “Sales”. In above code, I have used Peek() function to store maximum of modified date. Here is it’s syntax:

Peek( FieldName, Row Number, TableName)

This function returns the contents of given field for a specified row from the internal table. FieldName and TableName must be given as a string and Row must be an integer. 0 denotes the first record, 1 the second and so on. Negative numbers indicate order from the end of the table. -1 denotes the last record.

Since we know the date after which the records will be considered as new records, we can Load incremental records of the data set (Where clause in Load statement) and merge them with available QVD (Look at the snapshot below).

Now, load updated QVD (Sales), it would have incremental records.

As you can see, two records of 26-Aug-14 were added. However, we have inserted a duplicate record also. Now we can say that, an INSERT only method does not validate for duplicate records because we have not accessed the available records.

Also, in this method we can not update value of existing records.

To summarize, following are the steps to load only the incremental records to QVD using INSERT only method:

1) Identify New Records and Load it
2) Concatenate this data with QVD file
3) Replace old QVD file with new concatenated table

2. Insert and Update method:

As seen in previous example, we are not able to perform check for duplicate records and update existing record. This is where, Insert and Update method comes to help:

In the data set above (Right table), we have one record (ID = PRD1458) to add and another one (ID = PRD858) to update (value of sales from 131 to 140). Now, to update and check for duplicate records, we need a primary key in our data set.

Let’s assume that ID is the primary key and based on modification date and ID, we should be able to identify & classify the new or modified records.

In order to execute this method, follow similar steps to identify the new records as we have done in INSERT only method and while concatenating incremental data with existing one, we apply the check for duplicated records or update the value of existing records.

Here, we have loaded only those records where Primary Key(ID) is new and use of Exists() function stops the QVD from loading the outdated records since the UPDATED version is currently in memory so values of existing records gets updated automatically.

Now, we have all unique records available in QVD with an updated sales value for ID(PRD858).

3. INSERT, UPDATE, & DELETE method:

The Script for this method is very similar to the INSERT & UPDATE, however here we have an additional step needed to remove deleted records.

We will load primary keys of all records from current data set and apply an inner join with concatenated data set (Old+Incremental). Inner join will retain only common records and therefore delete unwanted records. Let’s assume that we want to delete a record of (ID PRD1058) in the previous example.

Here, we have a data set with the addition of one record (ID PRD1458), modification of one record (ID PRD158) and deletion of one record (ID PRD1058).

End Notes :

In this article, we have discussed how incremental loads are better and provide an efficient way to load data as compared to FULL load. As a good practice, you should have regular backup of data because it may get impacted or a data loss can occur, if there are issues with database server and network.

Depending on your industry and need of the application, you can select, which method works for you. Most of the common applications in BFSI industry are based on Insert & Update. Deletion of records is normally not used.

Have you dealt with similar situation or have another hack to improve efficiency of Qlikview applications under your hat? If so, I’d love to hear your thoughts through comments below as it also benefit someone else trying to handle similar situation.

Month in cross table does not match date field

I am having an issue in matching periods/months that head a crosstable to dates in a text file. Attached are examples of my data files, the output [actual generated versus what I need] and the QV document.
I have tried to change the headings to actual month names, did not work.
I go for easy solutions. So if the headings of the crosstable need to be changed, that can be done.

Who can help me?

thanks
koti

Types of loads in Qlikview

Types of load can be explained frm various point of views.Some apply while loading data from database and some apply while restructuring your data model within qlikview.

1. Laod from file:
You can load your data from excel/msdb/txt etc files or by creating ODBC and connecting to your database directly.
e.g.
LOAD BudgetYear,
Country,
Budget
FROM
[C:\Freight Budget 1996.xls]
(biff, embedded labels, table is Data$);

2. Inline Load:
You can load data from file or can define the data within Qlikview and load (Inline) from there. The inline data can be defined in the Inline Data Wizard as: Insert > Load Data > Inline Data.
e.g.
LOAD * INLINE [
Display as
Dollars
Percentage
];

3. Resident Load:
You can load data intoa Qlikview table. Then you can use that table as a resident table and data from that table with transfomation and calculation in resident load.
e.g.
emp1:
Select Name,DOJ,Salary,Bonus from employee;
Load Name,month(DOJ),Salary+Bonus as 'Total Payout' resident emp1;

You can also load from an existing field or a succeeding table.

4. Incremental Load (Differential/Delta Load):
Suppose your database data volume is big and you do not want to load the whole dataset everyday. Rather you want to load only the new/changed records for optimization. Incremental load is th solution then.
In this case, the typical process is to load the new data from database, load the old data from QVD and combine into a new QVD (repeated for each table).
The implementation can be done in the following way (think about SCD)
i. Append only - tracked by number of records. The number of records previously read is tracked and only the records from the last recodr till EOF is appended
ii. Insert Only - I - (No Updt/Del) - The records inserted after the last execution of the script, are added. This requires an SCD2 like effective data field.
iii. Insert and Update - IU - (No Del) - Records inserted or updated after last script execution are taken care of. Needs an effective date and PK field.
iv. Insert Update and Delete - IUD - This is basically a full fledged sync with the database.

5. Binary Load:
Binary load is used to share the entire dtamodel between 2 QVWs. Basically, the datamodel of one QVW (Q1) is copied from RAM to disk in 0 and 1 form, for another QVW (Q2). Thus Q2 inherits entire data of Q1.
e.g. if you have a base QVW where the common metrices are designed and you want to enhance this to build more business specific dashboards, Binary load is a good option.
Also, if you want to do incremental load, the loading of high volume historical data can be done by binary mothod, to utilize its speed.
Point to be noted here is: Binary has tobe the first statement of the script.
e.g.
Binary order.qvw;
Binary c:\order.qvw;

6. Add load:
Simply put, this statement blindly appends data from one table to the data of another table, having similar signature, during partial reload. It does not check for any duplicate. Hence, ADD LOAD or ADD SELECT is usually follwed by distinct or a proper where clause.
e.g.
LOAD OrderID, OrderAmt from Order_May.csv;
ADD LOAD OrderID, OrderAmt from Order_June.csv;

This will simply concate data from Order_June to Order_May. But OrderID might be duplicated. Hence, this statement can be properly shaped to remove duplicate data as:

LOAD OrderID, OrderAmt from Order_May.csv;
ADD LOAD OrderID, OrderAmt from Order_June.csv Where Not Exists(OrderID);

7. Buffer load:
With the BUFFER prefix, QVD files are created and maintained autmatically (QVDs cache or buffer the result of the statement). This is handy while doing incremenal load.
The QVD name is an internal one; 160 bit hex hash name comprising the entire the followingload/select statement. It is stored in the location as set in the User Preferences > Locations.
e.g.
Buffer without option:
buffer select * from Table1;

Internally a QVD file is created and the content of Table1 is internally stored in the QVD. This would be used indefinitely untill some other method is used.

buffer (incremental) load * from MyLog.log;

Same as incremental load. It is a typical solution for log files (text). Not for DB files.

buffer (stale after 7 days) select * from Table1;

It overwrites the current QVD and make a full load after 7(n) days. It can be used with DB tables. This is the timestamp till the QVD would be used as suorce. After this it would be a full reload and regeneration of the QVD (?).

N.B. This is my first post. Please correct me if required.