1- What is SSIS?
SSIS
was first introduced with SQL Server 2005, which was the next generation of SQL
Server software after SQL Server 2000. SSIS is an ETL tool that handles
data Extraction, Transformation and load. In addition, it can handles more
tasks besides the ETL tasks such as data profiling, file system manipulation
etc.... If you know C#.net or VB.net, you can use the script task to
perform much more efficient batch operations.
2- What is SSIS control flow integration?
SSIS
control flow allow you to program graphically how the tasks will run by using
the logical connectors between tasks. there are three basic logical
connectors that you can use: success, failure, or complete. Further more,
you can use the FX (expression) to handle more complex conditions in your
control flow.
3- What is Data Transformation?
The
data transformation stage applies a series of rules or functions to the
extracted data from the source to derive the data for loading into the end
target.
4- What can you do in an SSIS task?
SSIS
tasks can be broken up into different categories below:
a-
Data Preparation Tasks: for example, move file to different locations, validate
data, and analyze data.
b-
Process Communication Tasks: for example, Interact with services, processes, or
other Integration Services packages.
c-
SQL Server Tasks: for example, Execute bulk insert, SQL Statements, transfers
of database objects.
d-
Database Maintenance Tasks: for example, Perform backups, maintain indexes, and
other miscellaneous tasks.
5- What kind of variables can you create?
You
can create global variables and task level variables in SSIS. For programmers,
these variables are the same as global and function level variables. A global
variable is available to all tasks across the entire job. Variables created in
tasks are only available within that task.
6- What kind of containers can you use with
SSIS packages?
There
are three types of containers: sequence, for loops and for each loops.
A
sequence container is a simple way to group similar taks together. Think of a
sequence container as an organization container for more complex SSIS packages.
A
for loop container is what you can use to execute your tasks to a certain
number of times. For example, you need to update records ten times, you can
place the task that update the records inside this for loop container and
specify 10 as the end of the loops. by using the for loop container, you
don't have to create neither ten different packages to do the same task, or
have to run the whole package ten times when you schedule your job.
A
for each loop container will be useful when you don't know a head of time how
many times a task should perform. for instance, let's say that you want to
delete all the files inside a folder, but you don't know how many files are
there at any particular time, by using for each loop, it can go through the
collection of files and delete them for you, after the collection is emptied
out, it knows that when it should stop.
7- How to create the deployment utility?
Deployment
is the process in which packages converts from development mode into
executables mode. For deploying the SSIS package, you can directly deploy the
package by right clicking Integration Services project and build it. This
will save the package.dtsx file on the project\bin folder. Also, you can
create the deployment utility using which the package can be deployed at either
SQL Server or as a file on any location.
For creating deployment utility, follow these
steps:
1.
Right click on project and click on properites.
2.
Select "True" for createDeploymentUtiltiy Option. Also, you can
set the deployment path.
3.
Now close the window after making the changes and build the project by right
clicking on the project.
4.
A deployment folder will be created in BIN folder of you main project
location.
5.
Inside the deployment folder, you will find .manifest file, double clicking on
it you can get options to deploy package on SQL Server.
6.
Log in to SQL Server and check in MSDB on Integration Services.
8- What is Manifest file in SSIS?
Manifiest
file is the utility which can be used to deploy the package using wizard on
file system and SQL Server database.
9- What is File system deployment?
File
system deployment means to save pacakge file on local or network drive. Then
you can use SQL Agent job to schedule when the packages will run.
10- How to back up or retrive the SSIS
packages?
If
your package is deployed on SQL Server then you can back up the MSDB database
as all the package on SQL server deploys at MSDB.
11- What is data flow task in SSIS?
Data
flow task is the pumping mechanism that moves data from source to destination.
However, in the case of SSIS, you have much more control of what happens from
start to finish. In fact, you have a set of out of the box transformation
components that you snap together to clean and manipulate the data while it is
in the data pipeline. Just as the control flow handles the main workflow
of the package, the Data Flow handles the transformation of data. Almost
anything that manipulates data falls into the Data Flow Category.
12- What is data Profiling task?
Data
profiling is the process of analyzing the source data to better understand what
condition the data is in, in terms of cleanliness, patterns, numbers or nulls,
and so on. data profiling task usually be used at the beggining of the
development cycle to support the design of the destination database schema.
Note that this task is not used when you develop the normal recurring ETL
packages.
13- What is the multicast Transformation in
SSIS?
The
Multicast transform, as the name implies, can send single data input to
multiple output paths easily. You may want to use this transformation to send a
path to multiple destinations sliced in different ways. The multicast
transformation is similar to the Split Transformation because both send data to
multiple outputs. However, you can not specify the conditions for which part of
the data will be in which output in the Multicast transformation.
14- Differnce between Merge and Union All?
The
Merge transformation can merge data from two paths into a single output.
The Transform is usefull when you wish to break out your Data Flow into a
path that handles certain errors and then merge it back into the main Data Flow
downstream after the errors have been handled. it's also useful if you
wish to merge data from two Data Sources.
Note
that the data must be sorted before using the Merge Transformation. you can do
this by using the sort transformation prior to the merge or by specifying an
ORDER By clause in the source connection. Also, the metadata must be the
same for both paths. For example, CustomerID column cannot be a numeric column
in one path and a character column in the other path.
The
Union All Transformation works much the same way as the Merge Transformation,
but it does not require the data to be sorted. It takes the outputs from
multiple sources or transforms and combines them into a single result set.
15- OLE DB Command Transform?
The
OLE DB Command Transform is a component designed to execute a SQL Statement for
each row in an input stream. This task is analogous to an ADO Command Object
being created, prepared, and executed for each row of a result set. The input
stream provides that data for parameters that can be set into the SQL Statement
that is either an In - line statement or a stored procedure call.
16 - Execute package task?
The
Execute Package Task enables you to build SSIS solutions called parent packages
that execute other packages called "Child Packages". You'll
find this capability an indispensable part of your SSIS development as your
packages begin to grow.
Separating
packages into discrete functional workflows makes for shorter development and
testing cycles and facilitates best development practices. in SSIS, the child
package is aware of the parenment package parameters and can reach out and
access those parameters - level configuration values. The majority of the
configuratble properties are in the Package tab of the Executable package Task
Editor.
The
first option is to provide the location of the child package. The options
here are either File system and SQL Server. You can deploy SSIS package
in the FIle system task as an .dtsx file, or within the msdb database of a SQL
Server instance. if you select file system, you must first create a new
connection manager connection to the child package. if the child package is
located in a SQL Server, you'll need to provide the OLE DB Connection Manager
for the SQL Server that holds your packages. In either case, browse to
and then select the child package within the connection to set the package to
execute in the task.
17 - What is transaction in SSIS package and
how to implement it?
Packages
use transactions to bind the database actions that tasks perform into atomic
units. and by doing this maintain data integrity. Al MS IS container
types - packages the For loop, For each loop, and Sequence containers, and the
task hosts that encapsulate each task can be configured to use transactions. IS
provides three options for configuring transactions: Not supported, Supported,
and Required.
Require indicates
that the container start a transaction, unless one is already started by its
parent container. if a transaction already exists, the containter joins the
transaction For example, if a package that is not configured to support
transactions includes a Sequence container that uses the Required option, the
Sequence Container would start its own transaction. If the package were
configured to use the Required option, the Sequence containter would join the
package transaction.
Supported indicates that the container does not start a
transaction, but joins any transaction started by its parent container. For
example, if a package with four Executable SQL tasks starts a transaction and
all four tasks use the Supported option, the database updates performed by the
Execute SQL tasks are rolled back if any taks fails. if the package does
not start a transaction, the four execute SQL tasks are not bound by a
transaction, and no database updates except the ones performed by the failed
task are rolled back
Not Supported indicates that the container does not start a
transaction or join an existing transaction. A transaction started by a parent
container does not affect child containers that have been configured to Not
Support transactions. For instance, if a package is configured to start a
transaction and a For Loop Container in the package uses the NotSupported
option, none of the tasks in the For Loop can roll back if they fail.
18 - Difference between Execute TSQL Task and
Execute SQL Task
In
SSIS there is one tasks Execute TSQL task which is similar to Execute SQL task.
Will see what is the difference between two.
Execute TSQL Task:
Pros:
Takes less memory, faster perfomance
Cons:
Output into variable not supported, Only supports ADO.net connection
Execute SQL Task:
Pros:
Support output into variables and multiple types of connection, parameterized
query possible.
Cons:
Takes more memory, slower performance compared to TSQL task.
19- Precedence Constraints
A
task will only execute if the condition that is set by the precedence
constraint preceding the task is met. By using these constraints, it will
choose different execution paths depending on the success or failure of other
tasks.
Success
- Workflow will proceed when the preceding containter executes successfully.
Indicated in control flow by a solid green line.
Failure
- Workflow will proceed when the preceding container's execution results in a
failure. Indicated in control flow by a solid red line.
Completion
- Workflow will proceed when the preceding container's execution completes,
regardless of success or failure. Indicated in control flow by a solid blue
line.
Expression/Constraint
with logical AND - workflow will proceed when specified expression and
constraints evaluate to true. Indicated in control flow by a solid color line
along with a small 'fx' icon next to it. Color of line depends on logical
constraint chosen (e.g. success = green, completion = blue).
20- What is the use of config file in SSIS?
Config
file in SSIS is used to provide inputs to connection manager different
properites which package use to assign values at run time dynamically.
Using
config file user need to make changes to the config file which package will
take automatically at run time because of using it you don't need to every time
make changes in the packages in case you are deploying package on multiple
servers or locations. There are multiple ways in which configuration values can
be stored.
XML
configuration file: Store the config file as an XML file..
Environment
variable Store the config in on of the environment variables.
Registry
entry Store the confi in the registry
Parent
package variable Store the config as a variable in the package that contains
the tasks.
SQL
Server Store the config in t a table in SQL Server
21- Different between Control Flow and Data
Flow?
a.
Control flow is for designing the flow of the package. Data flow is for ETL
process.
b.
Data Flow is the subset of control flow
c.
There will be only one control flow while multiple dataflow can exists.
d.
Data flow cannot work without a control flow
All
process based tasks are part of control flow while ETL related tasks are the
part of Dataflow which is again a subset of control flow.
22 - Check point?
Checkpoint
is the property in SSIS which enables the project to restart from the point of
failure. When we set the property to true package create the checkpoin file
which stores the information about package execution and use to restart package
from the point of failure. If the package runs successfully, the
checkpoint file is deleted, and then re-created the next time the package runs.
No comments:
Post a Comment