Thursday, December 29, 2016

Improving SSIS ETL Performance by Avoiding Asynchronous Transformations

While performance tuning your SSIS package, you may often find that the transformations in your data flows take a long time to complete. One way to improve performance is to avoid asynchronous transformations. An asynchronous transformation is defined as a transformation that produces an output buffer that is different from its input buffer. To put it simply, a row doesn’t exit the transformation immediately after it enters. An example of this is the Sort transformation. Data does not start leaving the transformation until all the rows have entered and have been sorted in memory.
There are two types of asynchronous transformations: fully-blocking and semi-blocking. Fully blocking transformations read all input rows before output begins. Semi-blocking transformations will allow at least some rows out before having all rows inputted even though it produces a new buffer.

Examples of fully-blocking transformations include:

  • Sort
  • Aggregate
  • Fuzzy Lookup
  • Fuzzy Grouping

Examples of semi-blocking transformations include:

  • Merge Join
  • Pivot
  • Union All
Here are some tips that will help you minimize your use of asynchronous transformations:
  • Instead of using a Merge Join in your Data Flow, perform the joins in the source query or in a transform environment.
  • If you absolutely have to use Merge Join, make sure that you sort the data in your source query. In the Advanced Editor, set the “IsSorted” property to true, and set the Sort Key Position on the Output Columns to the appropriate values.
  • Instead of using an Aggregate transform do perform a count, consider using Row Count, which is a synchronous transformation.
  • If you are using a Pivot or Unpivot transformation and it is performing poorly, consider using staging tables in your solution in order to leverage your SQL server environment to perform these transformations instead of doing it in SSIS.
  • Even though Union All is a semi-blocking asynchronous transformation, you will probably not achieve better performance by replacing the Union All with multiple inserts.

No comments:

Search This Blog

DAX - Grouping on multiple columns and the count

Please go thorugh the below URL for entire soultion. http://community.powerbi.com/t5/Desktop/DAX-Grouping-on-multiple-columns-and-the-cou...