Monday, February 7, 2011

MSBI Stuff


Nth Highest Salary Query


Problem: Show the list of all employees from the employee table having second highest Salary or Nth highest salary in the organization.
Solution: Well we can achieve the result by so many ways but in this post I would like to do it by using co-related queries.  Co-related queries are very interesting stuff as it gives you option to process data row wise. But at the same time it has a big drawback. Row wise processing causes bottle neck for the performance of the query.
Query to get the list of employees getting 2nd highest salary (Set the value of @SalaryPosition to get Nth highest salary):

DECLARE @SalaryPosition INT
SET @SalaryPosition = 2
SELECT *
FROM Employee E1
WHERE @SalaryPosition =  
          (
            SELECT COUNT(DISTINCT E2.Salary) 
            FROM Employee E2
            WHERE E2.Salary >= E1.Salary
          )
Well the query looks simple but understanding it is little bit complex. Let’s explore the process behind it.
I have taken the variable @SalaryPosition in order to make it standardize while checking salaries in more than one level.
Before going further, let me tell you the concept behind the co-related query.


Co-Related Query:  In short, it’s a type of nested sub query where the inner query is referenced to some value of the outer query. And the inner query is executed for each record in the outer query making it slowL. More appropriate example of a Co-Related query is the IN () operator.
In our query, to get the 2nd highest salary, works in the same way too. Let’s dive a little deeper.
Sample data used in employee table:
slNo
empname
salary
101
Ram
5000
102
Abhishek
7500
101
kumar
5000
104
prasad
6570
102
Jumla
7500
101
Harkesh
12000
101
John
4000

 
Run the below query to create the table in your database and to insert data into it.

CREATE TABLE [dbo].[employee](
        [slNo] [INT] NULL,
        [empname] [VARCHAR](50) NULL,
        [salary] [FLOAT] NULL
)
INSERT INTO [dbo].[employee]([slNo],[empname],[salary])
SELECT '101', 'Ram', '5000' UNION All
SELECT '102', 'Abhishek', '7500' UNION All
SELECT '101', 'kumar', '5000' UNION All
SELECT '104', 'prasad', '6570' UNION All
SELECT '102', 'Jumla', '7500' UNION All
SELECT '101', 'Harkesh', '12000' UNION All
SELECT '101', 'John', '4000'
So we have taken two aliases for the employee table i.e. E1 and E2.
Employee table split
Now let’s see how to get the employee names getting 2nd highest salary.
Query:

DECLARE @SalaryPosition INT
SET @SalaryPosition = 2
SELECT *
FROM Employee E1
WHERE @SalaryPosition =  
           (
             SELECT COUNT(DISTINCT E2.Salary) 
             FROM Employee E2
             WHERE E2.Salary >= E1.Salary
           )
Here E1 is the outer query and E2 is the inner query.


Corollary: Let’s take the first record in the table E1 where the salary is 5000 and assume it as the second highest salary. But how will you know if it is actually the second highest salary or not. If 5000 is the second highest salary, then the distinct count of all the salaries from table E2 which are greater than or equal to 5000 must be 2.
Employee table pointing to 5000
The first record from E1 (salary = 5000) will be compared to all the records in E2 (salary).
Table E1  (considering the first record only)
Table E2 (distinct salary)
Distinct count where salary is >= 5000
5000
4000
5000
6570
7500
12000


4


Hence 5000 is not the highest salary as we have 4 more salaries greater than equal to 5000. But it is concluded that 5000 is at the fourth position.
Now let’s take the second salary from table E1 and do the same comparison as in the earlier case.
Employee table pointing to 7500
The second salary is 7500 and as per rule the count of distinct salaries greater than equal to 7500 must be 2.
Table E1  (considering the second record only)
Table E2 (distinct salary)
Distinct count where salary is >= 5000
7500
4000
5000
6570
7500
12000


2



Here in this case our condition in the where clause of outer query is satisfying. Let’s dissect the query for this particular case.
Original Query:

DECLARE @SalaryPosition INT
SET @SalaryPosition = 2
SELECT *
FROM Employee E1
WHERE @SalaryPosition =  
           (
            SELECT COUNT(DISTINCT E2.Salary) 
            FROM Employee E2
            WHERE E2.Salary >= E1.Salary
           )
Step 1:

SELECT *
FROM Employee E1
WHERE 2 =  
           (
             SELECT COUNT(DISTINCT E2.Salary) 
             FROM Employee E2
             WHERE E2.Salary >= E1.Salary
           )
Step2: (where the salary is 7500, inner query will return 2 as only two salaries are there in table E2 that are greater than equal to 7500.)

SELECT *
FROM Employee E1
WHERE 2 =  
              (
                2
              )
Step3: (Now 2 = 2, the outer query condition satisfies, hence it will return the corresponding record in the result set.
Flow of query
Once it returns the record 101, Abhishek, 7500 it will iterate through other records in table E1 to check if any other record is satisfying the same condition. In our example two records satisfies the condition “where 2=2”, hence both the records returned to the result set.
Finally we got all the employee information with second highest salary.
If you want to check for highest salary or second highest salary, then just change the value of @SalaryPosition accordingly.
Other ways to get list of employees getting 2nd highest salary in the organization are:

SELECT * FROM (
                    SELECT DENSE_RANK() OVER(ORDER BY salary DESC) AS RankID, * FROM dbo.Employee ) InnQ
            WHERE InnQ.RankID = 2


SELECT * FROM dbo.Employee WHERE salary = 
        (
               SELECT MAX(salary) FROM dbo.Employee WHERE salary < 
               (SELECT MAX(salary) FROM dbo.Employee)
        )


SELECT * FROM dbo.Employee WHERE salary = 
        (       
               SELECT MAX(salary) FROM dbo.Employee WHERE salary NOT IN 
               (SELECT MAX(salary) FROM dbo.Employee)
        )


T-SQL RowNumber() vs Rank() vs DenseRank() vs NTILE()


How if we found some things in life that we want to be done so smoothly, so perfect without any flaw or fracas that something like a wizard or some magic that makes things simpler and fine.
SQL Server comes with some magic words like the above phrase that makes simple the life of a developer or DBA to a great extent. And these magic words are RowNumber(), Rank(), DenseRank() and NTile().


We can do a lot more than what the words means by itself like starting from removing duplicates from a database, or getting the highest or second highest salaried employees in the organization, generating on the fly sequence number or identity column, or simply rank any column on the basis of its values, etc. The ranking functions set a rank to each of the records as per the condition specified. All these ranking function has its own meaning and purpose.
Let’s have a table to use to explain the effects with example. Let’s consider the dbo.employee table as below:
DeptNo
empname
salary
101
Ram
5000
102
Abhishek
7500
101
kumar
5000
104
prasad
6570
102
Jumla
7500
101
Harkesh
12000
101
John
4000
Run the below query to create the table in your database and to insert data into it.
[ Copy to Clipboard ] | [ View Source ]
CREATE TABLE [dbo].[employee](
        [DeptNo] [INT] NULL,
        [empname] [VARCHAR](50) NULL,
        [salary] [FLOAT] NULL
)
INSERT INTO [dbo].[employee]([DeptNo],[empname],[salary])
SELECT '101', 'Ram', '5000' UNION All
SELECT '102', 'Abhishek', '7500' UNION All
SELECT '101', 'kumar', '5000' UNION All
SELECT '104', 'prasad', '6570' UNION All
SELECT '102', 'Jumla', '7500' UNION All
SELECT '101', 'Harkesh', '12000' UNION All
SELECT '101', 'John', '4000'
Here in this article, I will elaborate each of these functions in brief and with examples.
Each of these functions has a similar type of syntax in T – SQL.
[ROWNUMBER( )/RANK()/DENSERANK()/NTILE()] OVER ( [ < partition_by_clause > ] < order_by_clause > )
Here <partition by clause> is optional, but is very useful when dealing with complex logic. It will partition the result set into sub sets on which the ranking functions should work. The ranking functions (windowed functions) will work on each partition separately. If we don’t specify the partition by clause, the whole result set will be considered as single partition.
When we use “Partition by “, we need to specify the columns on which it will partition the data.
For example, in the above dbo.employee table, if we want to partition by DeptNo column, then the employee table will be divided into 3 partitions as shown below.
Employee table partitioned by DeptID
Now the ranking functions / windowed functions will affect each of these partitions separately.
Clause: This will prepare the result set before the associated ranking function or windowed function is applied i.e. like partitioning dataset or ordering (order by clause) it on certain criteria.
Let’s check each of the ranking functions one by one.
Click to get info on ROW_NUMBER()  Click to get info on RANK()  Click to get info on DENSE_RANK()  Click to get info on NTILE()

Programming Foreach Loop Container – Enumerating Excel Sheets

In my previous article named “Programming Foreach Loop Container – Enumerating Excel Files”, we discussed about enumerating Excel files in a folder and load data from each excel file to the destination by using Foreach File enumerator.
But there may be cases where we may have data in the form of sheets in a single excel file. For example, the total sales of study materials over different countries.
To load all the sheets from a single excel file in Foreach Loop Container, we have to use “Foreach ADO.NET Schema Rowset Enumerator”. This special enumerator is used to enumerate through schema information about a data source i.e. the schema information supported by the OLEDB provider. For example, to get list of tables, schemas, column information, constraints, etc from a database.
Excel Sheets example
Below you will see step by step operations to configure the package for enumerating sheets from a excel file.
Step 1: Create a package with the basic components and tasks. Refer my earlier article for details. Programming Foreach Loop Container – Enumerating Excel Files
Step 2: Create a variable named “CurrentExcelSheet” of String type with default value as “India$” (one of the sheet name) to avoid any validation error.
Step 3: Create Connection manager for source (Excel Connection manager) and for the destination (OLEDB connection manager)
Connection Managers
Step 4: Create ADO .Net connection manager to access the excel file in the Foreach ADO.NET Schema Rowset Enumerator.
Right click on the connection manager area and select “New ADO .NET Connection
ADO .net connection


In the connection manager dialog box, select “Microsoft Jet4.0 OLE DB Provider” so that we can access Excel file by this OLE DB driver.
Microsoft Jet4.0 OLE DB Provider
Then in the “Database file name” section select the Excel file that contains information about all the sales. (If necessary select “All Files *.*” in the file open dialog box.)
Database file name
Leave the User name and password fields as it is. Click on “All” pane at left side and set the value for “Extended Properties” to “Excel 8.0”.
Extended Properties
Then test the connection and press OK.
Step 5: Create a “Foreach Loop Container” and set the “Enumerator” in Collection pane to “Foreach ADO.NET Schema Rowset Enumerator”. Here two more properties you have to set. They are:
a) Connection: Select an ADO.NET connection manager in the list. Here in our case, the connection manager will be “Ado .net for Excel” that we created above.
b) Schema: Select the schema to enumerate. For our case we set “Tables” as we need to enumerate all the sheets of the given excel file.
Keep the “Set Restrictions” settings as it is. For Each Loop Editor
Step 6: Then we have to set the Variable Mappings pane. Here we will set the “User::CurrentExcelSheet” variable to index 2. Here index 2 is specifically used because the table information of the excel sheet is at the 3rd column of the resultset generated by the “Foreach ADO.NET Schema Rowset Enumerator” and the columns of the resultset have 0 based indexing.
Step 7: Create the data flow task name “Sales” where we have one source named “Excel Source (Sales)”, but the table information for the source will be fetched from the variable “User::CurrentExcelSheet” so that at each iteration, we will get the new sheet information from the excel file.
Data Flow Task
For this we have to set the “Data access mode:” to “Table name or view name variable” and the variable name to “User::CurrentExcelSheet”.
Excel Source Editor
Step 8: Create the destination component and all other components as per the requirement and run the package. For more information about the other components, refer my article named “SSIS Programming Basic”.
Control Flow
So we successfully executed the package for extracting data from different sheets of a excel file.
Let’s have a look on the procedures to create the above logic by using Integration Service Programming in SSIS. The code will go in the same way as my previous article “Programming Foreach Loop Container – Enumerating Excel Files” except some modifications in the areas of Enumerator where we are using “Foreach ADO.NET Schema Rowset Enumerator
In the Excel source area the data is extracted from a variable instead of Open Rowset. The code is self descriptive and easy to understand. The code is given in both C# .net and VB .net language for both SQL Server 2005 and SQL Server 2008.

TRANSACTION Isolation Levels in SQL Server
SQL Server 2005 has some unique features to deal with the Transaction system in the database world. It has some unique sets to take care of every possibility of transactions or types of transaction. Technically, it will give us discrete ways to isolate the transactions from occurrence of deadlocks or crashes.
Before going deeper to the Isolation level that SQL Server provides to distinguish types of transaction, let’s have a look on the definition of the TRANSACTION. What does transaction means in real world and in a database scenario?


Transaction: When you give something to me, and I take it; then it’s a transaction. When you withdraw money from an ATM machine, and you receive the money; then it is also a kind of transaction. Here, what I am trying to reflect is a simple question that, Is the transaction above is valid or consistent. What if I deny accepting that you haven’t given me anything, may be you have given to someone else instead of me? What if after withdrawing of money from your account, your account balance still shows the same amount as before. (Oh! For this case you have to be lucky enough :)). And what will happen if you and your partner are withdrawing all your money from the joint account at the same time from different ATMs.
So there must be some methodology to keep track of all these things and to manage them perfectly even in such natural disaster conditions, the database and the information regarding any transaction must be in a consistent form.
To achieve the above thought in the database system, we have Locking mechanism. It acts like this, suppose there is a room and that is electronically locked and only the person who knows the password can enter, provided the room is empty or he has to wait till the room is evacuated by the other person. But here we have a little controversy, like the person who is waiting outside may have some different task than the person who is already inside. And it may be possible that both of the tasks may not interfere to each other or may interfere slightly that may be manageable. So at the end of the discussion, we may conclude that the security system must provide different types of security code or passwords to the corresponding person. Let’s have a deeper look on this.
Suppose you are doing a transaction for withdrawing money from the ATM machine and at the same time the bank manager is doing a routine checkup of your transaction which is totally a different operation and suppose at the same time the bank teller is checking your account for the remaining balance. All these operations are different but accessing the same entity or resource and that is your account information that is kept inside the database. Out of these operations only you are doing write operation in the database as you are withdrawing money and the remaining balance has to be updated in the database. So a proper security mechanism must be implemented here to ensure non-conflict or smooth going of these operations. Here the security can be ensured by putting locks (and of course the type of locks) for each type of operations, which means you are isolating the resources from other transactions that may hamper its consistency. Here comes the role of Isolation levels.
The Isolation levels are categorized depending on the type of locks it uses for a particular level. At lower level of isolation more users can access the same resource without any confliction, but they may face concurrency related issues such as dirty-reads and data inaccuracy (described below). At higher Isolation level, these types of issues can be eliminated but here only a limited no. of users can access the resource.
Let’s have a look on Locks and type of Locks. Locks can be treated as a policy which will prevent you or a process to perform any action (that may conflict with other actions) on an object or resource if that object or resource is already occupied by any other process or user. It’s something like you are going to propose someone who is already with someone else. But situation matters (may be you are lucky enough for this). Like it depends on what you are going to do and on what the other person is doing. So for such type of situations, we have types of locks.
Types of Locks:
  • Shared Locks(S): This lock is useful when you are doing some read operations and no manipulations like write operations (update/delete/insert). This is compatible with other shared locks, update locks and Intent shared locks. It can prevent users from performing dirty reads (described below).
  • Exclusive Locks(X): These locks are big possessive types. They are not compatible with any other locks. Like these locks will not work if any other locks are already there with the resource neither it will let other locks to be created on the resource until it finishes its job. This lock used for data-modification operations, such as INSERT, UPDATE or DELETE.
  • Update Locks (U): This can be treated as a mixture and perfect collaboration of the above two locks (Shared and Exclusive). Let’s take an example. You are going to perform an update operation on a table at row number 23. So here you are doing two types of operation, one is searching the record 23 which can be achieved by implementing shared lock and the other is updating the record after it has found which will be achieved by Exclusive lock. So, here the shared lock transforms to exclusive lock when it finds the target or else it will be remain as shared lock only. This prevents deadlocks to a great extent. This lock is compatible with Intent shared and shared locks.
  • Intent locks (also called as Demand Locks): These are used to establish a lock hierarchy. Here it will protect placing a shared (S) lock or exclusive (X) lock on a resource lower in the lock hierarchy. For example, suppose you are performing a read operation on a piece of data with shared lock. At the same time another user wants to modify data with exclusive lock, but the shared lock is compatible with other shared locks as a result any number of shared locks can be obtained on a piece of data and hence the user with exclusive has to wait indefinitely till the completion of all shared lock operations. So to avoid this type of starving situation, Intent locks are very useful. Here if the second user comes with Intent Exclusive lock, then no other transaction can grab a shared lock. Here it can claim the use of exclusive lock after the first transaction completes.
There are basically three types of Intent Locks that are most popular:
a) Intent Shared Lock(IS)
b) Intent exclusive (IX)
c) Shared with intent exclusive (SIX)
To get more information on Intent Locks, refer the link below:
http://msdn.microsoft.com/en-us/library/aa213039(SQL.80).aspx
  • Schema Locks: These locks protect the schema of the database. This deals with the DDL (Data Definition Language) commands like adding or dropping column information for a table, rename table, drop table, blocking any DDL operation during the execution of the query. There are two types of Schema Locks:
a) Schema modification (Sch-M): This lock is applied only when the SQL Server engine is modifying the structure of the schema like adding or dropping the columns of a table. During this period if any other transaction tries to access that object then that will be denied or delayed.
b) Schema stability (Sch-S): This indicates a query using this table being compiled. Here it will not block any transactional locks like shared locks or exclusive locks to perform any operation on the data. But if the query is in running condition, it will prevent execution of any DDL commands on that table.
  • Bulk Update Locks: This lock is useful while performing BULK operation on the TABLE like BULK INSERT. It will prevent any other types of normal T-SQL operations to be executed on the table except BULK processing of the data.



Now let us explore some buzzwords in Isolation Level:
Lost updates: It generally occurs when more than one transaction tries to update any specific record at a time i.e. when one update is successfully written to the database, but accidently a second update from different transaction overwrites the previous update information. This is called Lost Updates.
Non-repeatable reads (also called Inconsistent analysis): Dealing with inconsistent data i.e. suppose you read one value from a table and started working on it but meanwhile some other process modifies the value in the source resulting a false output in your transaction, then it is called Non-repeatable reads. Let’s have a more practical example, suppose before withdrawing money from your account, you always perform a balance check and you find 90$ as a balance in your account. Then you perform withdraw operation and try to withdraw 60$ from your account but meanwhile the bank manager debits 50$ from your account as a penalty of minimum balance (100$), as a result you have only 40$ in your account now. So your transaction either fails as the demanded amount (60$) is not there in your account or it may show (-20$) (which is quite impossible as of banking constraints :)). More simply we can say Non-repeatable reads take place if a transaction is able to read the same row several times and gets a different value for each time.
Repeatable Reads: This specifies that transactions cannot read data that has been modified by other transactions but not yet committed and if the current transaction is reading some data then no other transactions can modify that data until the current transaction completes.
Phantom reads: Don’t be afraid, we are not talking about ghosts or phantom in opera. Here Phantom means unexpected or unrealistic. It occurs basically when two identical queries are executed, and the set of rows returned by the second query is different from the first. Let’s have a simple example; suppose your banking policy got changed and according to that the minimum balance should be 150$ instead of 100$ for each account type, anyways this is not a big deal for a data base administrator. He will perform an update statement for each account type where the minimum balance is less than 150$ and updates the value to 150$. But unfortunately when the manager checks the database, he got one record with minimum balance less than 150$ in the same table. The DBA got surprised, how come this is possible as he performed the update statement on the whole table.
This is called Phantom read. The occurrence of Phantom reads are very rare as it needs proper circumstances and timing for such type of events as in the above example, someone may have inserted one new record with the minimum balance less than 150$ at the very same time when the DBA executed the UPDATE statement. And as it is a new record, it didn’t interfere with the UPDATE transaction and executed successfully. This type of Phantom reads can be avoided using higher level of isolation i.e. SERIALIZABLE (described below).
Dirty reads: This is one of the types of Non-repeatable Reads. This happens when a process tries to read a piece of data while some other process is performing some update operations on that piece of data and is not completed yet.

Now coming to the root point of the article i.e. the Isolation levels; we have basically five types of Isolation level in SQL Server 2005. Each one is described below:
Here we consider a simple example for all the below cases. The data shown in the table is taken by assumption and is only used for example purpose; the data given may or may not be right as per real scenario. The table information is given below:
Database Name: OLAP
Table Name: dbo.car_info
Table Column Information:
Column_name
Type
Car_Sl_No
int
CarCompany
varchar
CarBodyType
varchar
CarName
varchar
EngineType
varchar
Table Data:
Car_Sl_No
CarCompany
CarBodyType
CarName
EngineType
1
Maruti
small
Maruti-800
petrol
2
Honda
sedan
City
petrol
3
Maruti
small
Maruti-800
petrol
4
Maruti
small
Waganor Duo
petrol
5
Honda
sedan
City
petrol
6
TATA
small
indica
diesel
7
Mahindra
SUV
Scorpio
diesel
8
TATA
SUV
Sumo
diesel
9
Maruti
sedan
SX4
petrol
10
Maruti
sedan
Swift-Dzire
diesel
11
TATA
small
Nano
petrol
Assumption: Here in all our examples, two different transactions can be considered as done by two different users. For testing, you can achieve this by two separate Query windows or two separate instances for SQL Server Management Studio (SSMS). But you have to be careful enough to run the queries for both the connections simultaneously or immediately.
1. READ UNCOMMITTED Isolation Level: This is very useful in case you need higher concurrency in the transactions. Here one transaction can access the data that has been modified by the second transaction even if the second transaction is not committed.
Syntax:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Example: Suppose the User1 is trying to update the EngineType from ‘petrol’ to ‘diesel’ for Car_Sl_No with value 2. And at the same time User2 is trying to read the data for the Car_Sl_No with value 2. Under normal condition or default setting, User2 cannot read the data from that row. But if the User2 sets the transaction isolation level to ‘Read Uncommitted’, then it is possible to read that row with updated information even if the transaction is not committed by User1.
For User1:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'diesel'
 WHERE Car_Sl_No = 2
Here, note that the transaction is still running, as there is no commit statement in the above code. Under default condition, the query ran by User2 will keep executing till the User1 commits the transaction.
For User2:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
--Above statment is used to read the updated value even if the transation is not committed.
SELECT [Car_Sl_No]
      ,[CarCompany]
      ,[CarBodyType]
      ,[CarName]
      ,[EngineType]
  FROM [OLAP].[dbo].[car_info]
WHERE Car_Sl_No = 2
As in the above code, we set the transaction isolation level to ‘Read Uncommitted’; User2 can access that record with updated data.
Output:
Output 1
Although it increases the concurrency of the transactions but did you notice the disadvantage behind this. What if User1 ROLLBACK his transaction or if somehow the management studio of User1 crashed or hanged (As the transaction is not committed yet, it will rollback itself, resulting false or inconsistent value to User2).
Limitations:
  • Dirty-reads
  • Lost Updates
  • Phantom reads
  • Non-repeatable reads
Advantages:
  • Higher Concurrency
In SSIS (SQL Server Integration Service): To achieve the above norm in SSIS, select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “ReadUncommitted”.
SSIS ReadUncommitted
The benefit here is that more than one task can access the same table simultaneously in case of parallel execution of the package.
2. READ COMMITTED Isolation Level: This is the default level set in SQL Server 2005 and the immediate higher level of ‘READ UNCOMMITTED Isolation Level’. It prevents transactions to read data if some other transaction is doing some update operation on the data as a result eliminates Dirty Reads. It prevents reading of uncommitted data. But is affected with other demerits like ‘Lost Updates’.
Syntax:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
Example: Considering our previous example, let the EngineType for Car_Sl_No with value 2 is NULL and User1 is trying to update the EngineType to ‘petrol’, but at the same time User2 started a new transaction checked the value as Null and starts updating the record to ‘diesel’ before the transaction is committed by User1. As a result User1 lost its updated value, it is overwritten by User2.
For User1:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN

DECLARE @EngineType VARCHAR(20)
SELECT @EngineType = [EngineType] FROM [OLAP].[dbo].[car_info] WHERE Car_Sl_No = 2
--The below waitfor statement is used for other opearations that User1 is doing for this transaction.
WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
IF @EngineType IS NULL
BEGIN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'petrol'
 WHERE Car_Sl_No = 2
END
ELSE
BEGIN
        PRINT 'Record is already updated'
END

COMMIT TRAN
For User2:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN

DECLARE @EngineType VARCHAR(20)
SELECT @EngineType = [EngineType] FROM [OLAP].[dbo].[car_info] WHERE Car_Sl_No = 2
--Here waitfor statement is same for User2 also
WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
IF @EngineType IS NULL
BEGIN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'diesel'
 WHERE Car_Sl_No = 2
END
ELSE
BEGIN
        PRINT 'Record is already updated'
END

COMMIT TRAN
Here both the users successfully updated the value, but the value updated by User2 persists and User1 lost its updated value.
Output: The final output for the record is
Output 2
Limitations:
  • Lower Concurrency than ReadUncommitted
  • Lost Updates
Advantage:
  • Eliminates Dirty Reads
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “ReadCommitted”.
SSIS ReadCommitted
3. REPEATABLE READ Isolation Level: It is the next higher level than the previous isolation level and the main point here is it does not release the shared lock once the transaction starts for reading data. In simple terms, a transaction cannot read data if it has been modified by other transaction but not yet committed. Also no other transactions can modify data if that data has been read by the current transaction until the current transaction completes. Here in this isolation level, the concurrency rate is very low. As a result, eliminates ‘Lost updates’, non-repeatable reads, etc. But still has a big problem and that is called ‘Phantom read’. Let’s have an example to elaborate this.
Syntax:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
Example: Suppose the manager of a showroom declares to transfer all the cars manufactured by Honda Company to another showroom and to maintain a proper record for this operation. We need to add one more column called ‘TransferredSatus’ to indicate whether that car is transferred or not. Here, the DBA will check for the presence of any Honda Company cars in the record that are not yet transferred by checking the value of the column ‘TransferredSatus’. If he found some, then corresponding transfer operations will be performed and the record will be updated to ‘1’ (i.e. transferred). Here by using ‘Repeatable Read’ isolation level, we can eliminate ‘Lost Update’, ‘dirty reads’ and ‘non-repeatable reads’. But what if at the time of updating the database, someone else from the inventory system inserts one record about the new Honda Company car that just arrived to the showroom. Let’s see the effect.
For User1:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN

        --check the existance Honda company cars
        DECLARE @Car_Sl_No INT
        DECLARE TransferingCarsCursor CURSOR FOR
        SELECT Car_Sl_No FROM dbo.car_info WHERE CarCompany = 'Honda' and TransferredSatus = 0

        OPEN TransferingCarsCursor

        FETCH NEXT FROM TransferingCarsCursor
        INTO @Car_Sl_No
        WHILE @@FETCH_STATUS = 0
        BEGIN
               ----------------------------------
               ------Car transfering operations--
               ----------------------------------
        FETCH NEXT FROM TransferingCarsCursor
               INTO @Car_Sl_No
        END
        CLOSE TransferingCarsCursor
        DEALLOCATE TransferingCarsCursor

        WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
        -- This is the time when the other user inserts new record about new Honda car.

        UPDATE dbo.car_info
               SET TransferredSatus = 1 WHERE CarCompany = 'Honda' and TransferredSatus = 0

COMMIT TRAN
Here it found only 2 records from Honda Company.
For User2:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic GX','petrol',0)

COMMIT TRAN
But in between the execution of the transaction by User1, User2 inserts one new record about the new Honda Car. Assume the record is inserted before the Update statement of User1, as a result instead of updating only 2 records; User1 updates the new record as well along with the earlier records, showing wrong information in the chart. This is called ‘Phantom Read’. Even ‘Repeatable Read’ isolation mode can’t resolve this problem. For this, you need to implement higher isolation level i.e. SERIALIZABLE.
Output for User1:
(3 row(s) affected)
Limitations:
  • Lower Concurrency
  • Phantom Reads
Advantage:
  • Eliminates Dirty Reads
  • Eliminates Lost Updates
  • Eliminates Non-Repeatable Reads
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “RepeatableRead”.
SSIS RepeatableRead
4. SERIALIZABLE Isolation Level: It is highest level in Isolation levels as a result the concurrency rate is low. But it eliminates all issues related to concurrency like dirty read, non repeatable reads, lost updates and even phantom reads. According to this Isolation Level:
  1. Statements cannot read data if other transactions are performing update operations on the data and is not committed yet.
  2. Also no other transactions can perform any update operations until the current transaction completes its read operations.
  3. And the important point here is that it is performing a Range Lock based on the filters used to get the data from the table i.e. it locks not only the current records but also the new records that are falling under the current filter condition. In simple language, no other transactions can insert new rows that are falling under the current filter condition until the transaction completes.
Considering our previous example, we will set the isolation level to Serializable.
Syntax:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
For User1:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN

        --check the existance Honda company cars
        DECLARE @Car_Sl_No INT
        DECLARE TransferingCarsCursor CURSOR FOR
        SELECT Car_Sl_No FROM dbo.car_info WHERE CarCompany = 'Honda' and TransferredSatus = 0

        OPEN TransferingCarsCursor

        FETCH NEXT FROM TransferingCarsCursor
        INTO @Car_Sl_No
        WHILE @@FETCH_STATUS = 0
        BEGIN
               ----------------------------------
               ------Car transfering operations--
               ----------------------------------
        FETCH NEXT FROM TransferingCarsCursor
               INTO @Car_Sl_No
        END
        CLOSE TransferingCarsCursor
        DEALLOCATE TransferingCarsCursor

        WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
        -- This is the time when the other user inserts new record about new Honda car.

        UPDATE dbo.car_info
               SET TransferredSatus = 1 WHERE CarCompany = 'Honda' and TransferredSatus = 0

COMMIT TRAN
For User2:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic GX','petrol',0)

COMMIT TRAN
Output for User1:
(2 row(s) affected)
Here User2 transaction will wait till the User1 transaction completed avoiding ‘Phantom reads’.
Limitations:
  • Lower Concurrency
Advantage:
  • Eliminates Dirty Reads
  • Eliminates Lost Updates
  • Eliminates Non-Repeatable Reads
  • Eliminates Phantom Reads
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “Serializable”.
SSIS Serializable
5. SNAPSHOT Isolation Level: It specifies that the data accessed by any transaction is consistent and valid for that particular transaction and the data will be same throughout the whole transaction. It implements Row Versioning to isolate data for each transaction i.e. it will keep separate version of each modified row in the transaction in the tempdb database totally dedicated to that transaction. Any update of data in the original row will not affect the current transaction.
The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start a transaction that uses the SNAPSHOT isolation level. It is by default kept as OFF because of performance issues.
To enable SNAPSHOT isolation level, use the below alter database command.
ALTER DATABASE OLAP SET ALLOW_SNAPSHOT_ISOLATION ON
We will consider a small example to illustrate the above condition.
Syntax:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
Example: We will try to insert a new record in the [car_info] table by User1 and at the same time we will try to fetch the records by User2.
For User1:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic Hybrid','petrol',0)
Note: The above transaction is not committed yet.
For User2:
[ Copy to Clipboard ] | [ View Source ]
USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRAN
SELECT * FROM dbo.car_info WHERE CarCompany = 'Honda'
COMMIT TRAN
Output for User1:
(1 row(s) affected)
Output for User2:
Output 3
One record is successfully inserted by User1, but a consisted version of the previous data is kept in Version store (in tempdb) before the starting of the transaction. So User2 is accessing the data from the version store and is unable to show the newly inserted record.
Now commit the transaction for User1 by “COMMIT TRAN” command, and again run the transaction for User2, the output will be as below:
Output 4
You can check the version store for the current transaction along with other information regarding the current transaction by running the below DMVs (Dynamic Management Views) before committing User1 transaction.
SELECT * FROM sys.dm_tran_active_snapshot_database_transactions
Output:
Output 5
Limitations:
  • Low performance due to versioning in tempdb
Advantage:
  • Eliminates Dirty Reads
  • Eliminates Lost Updates
  • Eliminates Non-Repeatable Reads
  • Allows multiple updates by versioning
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “Snapshot”.
SSIS Snapshot
Other Isolation Levels in SSIS:
  • Chaos Isolation Level: Behaves the same way as ReadUncommitted, with additional features as stated below:
  1. It permits viewing uncommitted changes by other transactions.
  2. It checks any other uncompleted update transactions with higher restrictive isolation levels to ensure not to raise any conflicts i.e. any pending changes from more highly isolated transactions cannot be overwritten.
  3. Rollback is not supported in this Isolation level.
If you want to perform read operations over the data once per transaction, then go for the Chaos isolation level.
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “Chaos”.
SSIS Chaos
  • Unspecified Isolation Level: When the Isolation level of any transaction cannot be determined, then it comes under ‘Unspecified Isolation Level’ i.e. a different isolation level than the ones above are used. For example performing custom transaction operation like ODBCtransaction, if the transaction level does not set by the user then it will execute according to the isolation level associated by the ODBC driver.
In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “Unspecified”.
SSIS Unspecified
Optimistic Vs Pessimistic:
Optimistic concurrency: Here SQL Server assumes that the occurrence of resource conflicts between different transactions are very rare but not impossible. So it allows transactions to execute without locking any resources. Only in case of any modifications in the data, it will check for any conflicts, if it finds any then it will perform the locking operations accordingly. In simple terms, we are assuming that every transaction will carry on without any problem except some exceptional cases.
Pessimistic Concurrency: Here it will lock resources irrespective of the type of transaction to ensure successful completion of transaction without deadlocks. Here, we are assuming that the conflicts are likely and some major steps have to be taken to avoid those conflicts.
Let’s have an example on this:
Suppose in a car showroom, a customer wants to go for a test drive, but before the manager say something, it has to be clear that the car is empty and is ready for driving. What if another customer is already requested for the test drive for the same car? If the manager allows both of them to drive the car simultaneously, considering mutual understanding between the customers then we call it as an Optimistic concurrency. But if the manager wants to be sure about non-conflicts of the customer, then he allows the customers for test driving one-by-one. This is what we call as Pessimistic Concurrency.

Programming Foreach Loop Container – Enumerating Excel Files

Sometimes we were not aware of the contents of any object and we have to do certain tasks as many times as the no. of contents of that object. For example, a folder contains some text files with employee information data and we have to load all the text files data to our SQL server employee table. Here, we don’t know about the count of the source but we have to roll over every file. For this type of situation, SSIS have a unique container called “Foreach Loop Container” that will enumerate with each text file in the folder.
It can be used in some other cases too like enumerating each sheet of a excel file, or parsing each row of a table, enumerating each child node of a given tree, etc. In SQL server 2000 DTS package, it is bit difficult to loop over any task but fortunately SSIS has a special container to accomplish such tasks.
Benefits from SSIS Foreach Loop Container:
  • It keeps the package neat and clean (avoids building of repeated tasks)
  • It can iterate any object dynamically (even if the contents count is not available)
The foreach loop container acts as a repeating control flow in a package. Its operations are similar to work of Foreach keyword in any advanced programming language. We have a definite type of enumerator for each type of objects. Foreach loop container uses one of this enumerator to carry out its operation:
Below are the lists of various types of enumerators provided by SSIS Foreach Loop Container:
  1. Foreach File Enumerator: It enumerates files in a folder. The plus point here is it can traverse through subfolders also.
  2. Foreach Item Enumerator: It enumerates items in a collection. Like enumerating rows and columns in an Excel sheet.
  3. Foreach ADO Enumerator: Useful for enumerating rows in tables.
  4. Foreach ADO.NET Schema Rowset Enumerator: To enumerate through schema information about a data source. For example, to get list of tables in a database.
  5. Foreach From Variable Enumerator: Used to enumerate through the object contained in a variable. (if the object is enumerable)
  6. Foreach NodeList Enumerator: Used to enumerate the result set of an XML Path Language (XPath) expression.
  7. Foreach SMO Enumerator: It enumerates through SQL Server Management Objects (SMO) objects.


Let’s consider the below example;
Here we have a set of Excel files in a folder containing different data but of same structure. Our requirement is to load all the data from these excel files to our destination table in the SQL Server database.
For this example, the excel files contain information about the sales of Pencil across different countries. We have a total of 5 excel files in the folder named “SampleExcelFiles“. To achieve this, we have to take the help of FOR EACH Loop Container provided by SQL Server Integration Service (SSIS).
[Note: All other components and tasks are described in the earlier article named "SSIS Programming Basic"]
After creation of necessary components, the “Foreach Loop Container” is created where only one Data Flow Task has been kept. The single Data Flow Task will load all the excel files data to the destination after changing the source connection string each time the loop iterates.
For this, one more package level variables have to be created named:
  • CurrentExcelPath: This will keep track of the path of current excel file after each iteration.
Create one Excel connection manager to handle all these excel files. Take one existing excel file to avoid validation errors. This workbook is just for validation purpose. At runtime i.e. while package will execute the Connection string will automatically be updated to the value of CurrentExcelPath. Or else you can keep it empty provided DelayValidation property of this connection manager is set to True. In this example the connection manager name is “Excel Connection Manager“.
In the connection manager property, select “Expressions” and then set the first property as “ConnectionString“.
Property Expression Editor
Click on the ellipse button, and in the “Expression Builder” dialog box, Enter the following code:
“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=” + @[User::CurrentExcelPath] + “;Extended Properties=\”Excel 8.0;HDR=Yes\”"
Expression Builder
Note: Here in this expression, the [User::CurrentExcelPath] has been used to complete the expression at runtime.
Then drag the “Foreach Loop Container” to the package area and double click to open the Foreach Loop Editor. On the Collection page, select For Foreach File Enumerator. In Enumerator Configuration, select the Folder where all the excel files are kept or stored. State the File Filter to be considered. For this example, the folder path is “I:\Site work\SampleExcelFiles” and the File filter is “*.xls”.
Foreach Loop Editor
Then in the Variable Mappings page, map the Index 0 to the user defined string variable that is defined above for storing the current Excel file path i.e. CurrentExcelPath. Click OK to close the Foreach Loop Editor.
Now create the dataflow task that will use the “Excel Connection Manager” as its connection manager to handle Excel files. Add the Data Flow components like Source, row counter and destination and then execute the package.
Execute Package
Output:
Output

Programming SSIS Logging


SQL Server provides us with a powerful Data Integration service called SSIS. But to make it more robust and effective, we need to monitor some of the vital measures so as to increase its performance, troubleshooting errors and to keep track of the data flow. The best way provided by SSIS to accomplish this is SSIS Logging. SSIS will log entries in the log file or table on the basis of the events that occur during the execution of the package.
Here in this section, we will tell you about different features of SSIS logging and the implementation of the same by programming Integration Services.
Here we will consider the same package used in “SSIS Programming Basic
To start with SSIS Logging, click on “SSIS” in the menu bar or right click on empty area in Control flow and select “Logging…
ssisloggingprog1
You will get a dialog box as below. The logging is disabled by default.
ssisloggingprog2
To enable it, enable the checkbox in the Containers pane and a Provider type for the Logging. Let’s consider the Provider type is Text file i.e. “SSIS Log Provider for Text Files” and click on Add… button.
Provide a File Connection manager for Logging. Let the connection manager name is “SSIS Log.txt” (same as the file name but you can give different name for this)
ssisloggingprog3


Set the properties of the log provider as given below in the figure. Select the log provider.
ssisloggingprog4
Let’s consider that we are going to implement logging at the package level. We need information regarding “OnError” and “OnWarning” events.
List of events that are supported by SSIS are:
OnError
OnExecStatusChanged
OnInformation
OnPipelinePostEndOfRowset
OnPipelinePostPrimeOutput
OnPipelinePreEndOfRowset
OnPipelinePrePrimeOutput
OnPipelineRowsSent
OnPostExecute
OnPostValidate
OnPreExecute
OnPreValidate
OnProgress
OnQueryCancel
OnTaskFailed
OnVariableValueChanged
OnWarning
Diagnostic
[Note: The description for each of the events is given in "Configure SSIS Logs" dialog box.]
Click on Details tab to select the events that we need to track. And then click on Advanced >> button to get more options for a particular event.
Below is the list of additional information that we can log for a particular event:
Value
Description
Computer
The name of the computer on which the logged event occurred.
Operator
The user name of the person who started the package.
SourceName
The name of the package, container, or task in which the logged event occurred.
SourceID
The global unique identifier (GUID) of the package, container, or task in which the logged event occurred.
ExecutionID
The global unique identifier of the package execution instance.
MessageText
A message associated with the log entry.
DataBytes
Reserved for future use.
Let’s consider we are interested to keep Computer, SourceName and MessageText as logging information for our package as shown below.
ssisloggingprog5
Click OK, save the package and execute it.
You can see the logs generated in the file “SSIS Log.txt“.
Let’s do the same logging mechanism by SSIS programming.
Steps to follow for implementing logging mechanism in SSIS package:
Step1: Enable the logging mode of the package
package.LoggingMode = DTSLoggingMode.Enabled;
Step2: Create Log file connection manager, for our case it is a flat file named “SSIS Log.txt
Step3: Add a package log provider. In SSIS 2005 the CreationName is “DTS.LogProviderTextFile.1″
Step4: Assign connection manager to the logging provider.
Step5: Select the events that has to be considered for logging, i.e. “OnError”, OnWarning”
Step6: Select the options for each events that has to be taken for gathering information while the package is running, i.e. Computer, SourceName and MessageText. This can be achieved by using the structure DTSEventColumnFilter.
Step7: Set column filter to its respective events.
Step8: Add the rest of the components of the package.
Note: For detail about the other components of the package, please refer to our earlier post named “SSIS Programming Basic
Note: SQL Lion team had developed one application name Expert Logger to set the SSIS logging mechanism batch wise i.e. implementation of logging to any number of packages at a time. Download the free version from here.

No comments:

Search This Blog

DAX - Grouping on multiple columns and the count

Please go thorugh the below URL for entire soultion. http://community.powerbi.com/t5/Desktop/DAX-Grouping-on-multiple-columns-and-the-cou...