Downloading a file over HTTP the SSIS way

by Darren Green 2 Jul 2009 18:14

This post shows you how to download files from a web site whilst really making the most of the SSIS objects that are available. There is no task to do this, so we have to use the Script Task and some simple VB.NET or C# (if you have SQL Server 2008) code. Very often I see suggestions about how to use the .NET class System.Net.WebClient and of course this works, you can code pretty much anything you like in .NET. Here I’d just like to raise the profile of an alternative.

This approach uses the HTTP Connection Manager, one of the stock connection managers, so you can use configurations and property expressions in the same way you would for all other connections. Settings like the security details that you would want to make configurable already are, but if you take the .NET route you have to write quite a lot of code to manage those values via package variables. Using the connection manager we get all of that flexibility for free. The screenshot below illustrate some of the options we have.

HTTPConnectionManagerEditor

Using the HttpClientConnection class makes for much simpler code as well. I have demonstrated two methods, DownloadFile which just downloads a file to disk, and DownloadData which downloads the file and retains it in memory. In each case we show a message box to note the completion of the download. You can download a sample package below, but first the code:

Imports System
Imports System.IO
Imports System.Text
Imports System.Windows.Forms
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

    Public Sub Main()

        ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager"
        Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        ' Create a new HTTP client connection
        Dim connection As New HttpClientConnection(nativeObject)


        ' Download the file #1
        ' Save the file from the connection manager to the local path specified
        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile(filename, True)

        ' Confirm file is there
        If File.Exists(filename) Then
            MessageBox.Show(String.Format("File {0} has been downloaded.", filename))
        End If


        ' Download the file #2
        ' Read the text file straight into memory
        Dim buffer As Byte() = connection.DownloadData()
        Dim data As String = Encoding.ASCII.GetString(buffer)

        ' Display the file contents
        MessageBox.Show(data)


        Dts.TaskResult = Dts.Results.Success
    End Sub

End Class

Sample Package HTTPDownload.dtsx (74KB)

Kommentare (31) -

16.07.2009 02:42:06 #

Joel Watson

Works like a dream. Thanks for making the dtsx and sample files available so I could try it out straight away!

Joel Watson United States

16.07.2009 21:21:29 #

PKSpence

Just out of curosity, why are you doing this:

        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile("C:\Temp\Sample.txt", True)

Seems to me that the variable containing the name of the file should be specifed as a parameter to the .DownloadFile() method:

        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile(filename, True)

Just my .02 worth

PKSpence United States

17.07.2009 14:17:02 #

Darren Green

PKSpence, you're quite right I meant to use the variable consistently, code updated. Thanks.

Darren Green United Kingdom

19.07.2009 15:02:03 #

CozyRoc

You can check an alternative based on WebClient class here: www.cozyroc.com/script/http-upload-download-task

CozyRoc United States

16.09.2009 20:35:21 #

Kyle

Thank you very much for the clear article, worked like a champ (except I had to change MessageBox to MsgBox).

Kyle United States

16.09.2009 22:29:58 #

Darren Green

Kyle, did you forget to add Imports System.Windows.Forms? MsgBox only works in VB.NET, and as far as I recall is a hangover from old VB, really just a shortcut left in the VB.NET language. If you prefer it then that's fine too.

Darren Green United Kingdom

22.09.2009 21:42:08 #

Casey

This worked wonderfully and was exactly what we needed.  We were able to accomplish something much faster than I thought possible using the information you supplied. Thank you.

Casey United States

29.09.2009 21:55:59 #

Kanthi

Hi Darren,

Your Script Works Like a  Charm.

But i Have One Question Though - How to DownLoad the Dinamically Changing File Name.
I want to Automate the Process of Downloading.
The File Name Changes Everyday. e.g with New Date.

I trying to use expressions buyt doesn't Works.

Thanks in Advance
Kanthi

Kanthi United States

20.11.2009 19:04:32 #

SeanO

I am gettng a couple errors. It looks like it has to do with using this package on a different machine... Since no one else has mentioned it, I bet it is because I am not familiar with dtsx... Anyone have an idea of what's going on?

Error: 2009-11-20 10:59:49.17
   Code: 0xC0016016
   Source:
   Description: Failed to decrypt protected XML node "DTS:Property" with error 0
x8009000B "Key not valid for use in specified state.". You may not be authorized
to access this information. This error occurs when there is a cryptographic err
or. Verify that the correct key is available.
End Error


and

Error: 2009-11-20 10:59:49.35
   Code: 0x00000002
   Source: Script Task
   Description: The script threw an exception: The system cannot find the path s
pecified. (Exception from HRESULT: 0x80070003)
End Error


Anyone? Help a n00b!

SeanO

21.11.2009 00:21:37 #

Joe

Would you be so kind to translate these to line to C#:
===============
        ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager"
        Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        ' Create a new HTTP client connection
        Dim connection As New HttpClientConnection(nativeObject)
===============
Thanks

Joe

26.11.2009 14:38:04 #

Rizwan

Cant we get whole directory, i mean use wild cards.

Rizwan India

01.12.2009 22:38:35 #

dass

Hi Darren,

Can you light me how to download gzip file from http site using authentication.
cheers
dass

dass New Zealand

03.12.2009 18:41:32 #

dhiva

Issue: couldn't download the file Using HTTPS:// with Certicates.

I try to use this above script and download the file file from HTTPS:// using the CERTIFICATES. It connects to the server. But It doesn't download the file. It creates the file with the Ceriticate name on it.

The text file shows "Virtual user wll0001f logged in." The wll0001f is the certificate name.

I have created the HTTP Connection manager in my SSIS with CERTIFICATE property.

Could I get any help?

dhiva United States

11.01.2010 19:18:20 #

HollyK

Thanks very much, works like a charm.  

HollyK United States

28.07.2010 16:01:42 #

apal

I am in SSIS. Where do I add this code? I have to copy the internet file to a local folder. I created the HTTP conn manager. But now whihc SSIS task will handle this code? Please help.

apal United States

29.07.2010 07:58:01 #

Darren

apal, the Script Task can be used to host code.

Darren United Kingdom

09.08.2010 20:33:14 #

Naresh Jagtiani

Need some help here,  would this script work with https and certificates.  I am getting a proxy error.
When I select Test Connection in connection manager it succeeds however it gives Proxy authentication failed.
Error: The script threw an exception: Proxy authentication failed. This error occurs when login credentials are not provided, or the credentials are incorrect.

Thanks in advance.

Naresh Jagtiani United States

10.08.2010 22:33:24 #

Darren

Naresh, This work fine with SSL.

Your error indicates a problem with the proxy server authentication, so I would talk to you network team.

There is a limitation with the HTTP Connection manager, it doesn't have any option to pass through the default Windows credentials which is a simple way to supply valid Windows credentials to a proxy server. If that is issue use the System.Net.WebClient approach and pass in the default credentials like this -

webClient.Credentials = System.Net.CredentialCache.DefaultCredentials

Darren United Kingdom

11.08.2010 19:35:49 #

Naresh Jagtiani

Hello Darren,  Thanks for your reply.  This is what I put in the script task which worked for me.

Imports System.IO
Imports System.Net
Imports System.Text
Imports System.Web
Imports System.Security.Cryptography.X509Certificates


Public Class WebRetrieve
    Public Shared Sub Main()
        Dim mycert As New System.Security.Cryptography.X509Certificates.X509Certificate _
(System.Security.Cryptography.X509Certificates.X509Certificate.CreateFromCertFile _
("C:\Certifcates\cert.cer"))
  
       Dim wr As HttpWebRequest = CType(WebRequest.Create("https://reports/reports.zip"), HttpWebRequest)
        wr.ClientCertificates.Add(mycert)

        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)
        Dim str As Stream = ws.GetResponseStream()
        Dim inBuf(100000000) As Byte
        Dim bytesToRead As Integer = CInt(inBuf.Length)
        Dim bytesRead As Integer = 0
        While bytesToRead > 0
            Dim n As Integer = str.Read(inBuf, bytesRead, bytesToRead)
            If n = 0 Then
                Exit While
            End If
            bytesRead += n
            bytesToRead -= n
        End While
        Dim fstr As New FileStream("c:\New\reports.zip", FileMode.OpenOrCreate, FileAccess.Write)
        fstr.Write(inBuf, 0, bytesRead)
        str.Close()
        fstr.Close()
    End Sub
End Class

Naresh Jagtiani United States

23.09.2010 17:11:33 #

Ahalya

Hi Darren, Thanks for the script.  New at SSIS - this was my first script task - worked great!

Ahalya Canada

27.10.2010 18:48:06 #

ashok

Hi Darren,

Your Script Works Like a  Charm.

But i Have One Question Though - How to DownLoad the Dinamically Changing File Name.
I want to Automate the Process of Downloading.
The File Name Changes Everyday. e.g with New Date.

I trying to use expressions buyt doesn't Works.

Thanks in Advance
Ashok

ashok India

28.10.2010 13:53:58 #

Darren Green

Ashok, The file to download is controlled by the ServerURL property of the HTTP Connection Manager. You can set an expression on that property.

Darren Green United Kingdom

02.11.2010 20:07:34 #

ram

Need some help here,  would this script work with https with login credientials.  Thanks in advance.

ram United States

09.11.2010 13:49:09 #

Darren Green

Ram why not try it? The HTTP Connection Manager allows you to specify the protocol as https, and also allows you to specify credentials.

Darren Green United Kingdom

04.11.2011 16:02:29 #

Dinesh

I am trying to download the Excel file located in HTTP. It is downloading but when opening the file, am getting this error "excel cannot open the file becuase the file format or file extension is not valid. verify that the file has not been corrupted and that the file extension matches the format of the file"

Dinesh India

13.12.2011 08:46:48 #

Dinesh

Hi Darren, when am downloading the file it is only downloading 3kb out 1700kb file. I am not able to figure it out why it is downloading only 3kb file. Please help me.

Dinesh India

09.01.2012 20:48:36 #

Marzipan

We are trying to download documents from a sharepoint list using the HTTP connection.

Any idea why we get a 401 error when the file is a infopath document.  A xml document created with a text editor is ok, .xlsx, .docx, .png files all can be downloaded except the infopath document.

All are stored in the same "Shared Documents" library.

Marzipan United States

09.04.2012 16:33:41 #

Marzipan

Found the answer.  You need to add "?NoRedirect=true"  (w/o quotes) to the URL.

This gives you raw XML.  

Now onto processing XML, we are studying up on XSLT :)

Note: if you need the schema for a infopath form template (xsn):
1.  add  ".cab" to the filename (w/o quotes)
2.  right click filename and select explore
3.  copy the MySchema.xsd to another location, give it a more descriptive name.

Marzipan United States

12.04.2012 08:58:46 #

Swetha

When i place a package with httpconenction manager with useproxy set to false(in fact no changes done in proxy tab for this connection manager) in job and execute it, i get the error:

Connection manager "HTTPConnectionManager" This error can occur when the server does not exist, or the proxy settings are incorrect.  

--Swetha.

Swetha India

12.04.2012 10:49:38 #

sambit

when i am pasting this code in the script task,then it shows that dts is not defined can you just say me why this error comes and what is the solution for that error.and the script task will be used in the control flow or the data flow.

sambit United States

05.07.2012 22:52:25 #

Dinesh

my http link having xml data...when I download the file as xml, am getting code in xml file. Any idea how to handle this, pls...

Dinesh India

Kommentar schreiben

  Country flag

biuquote
  • Kommentar
  • Live Vorschau
Loading

RecentComments

Comment RSS
SQLIS | Script Task/Component and Template Information

Script Task/Component and Template Information

by Allan Mitchell 8 Dec 2009 06:46

The Script Task and Script component are often used by people developing SSIS packages because they are easy to use and now because SSIS could be perceived to be more developer friendly they are very powerful.

That being said we should no be using them everywhere.  There are generally Tasks/Components already provided that will do the job it may be that we have to rethink the way we want to draw our package.

I had cause last week to break out the script component in SQL Server 2008 SP1 and found that it was broken.  I don’t know when it broke as I do not use them all that often.  My error was as below.

Templates 

 

Something must have overwritten this template information.  I looked in Event Viewer and tried the things it suggested but the templates still did not work.  Here is how I got them eventually to work for me (Your Mileage may vary)

Open up a Command Prompt window using an administrator level account and “as an administrator”

vsta.exe /hostid SSIS_ScriptTask /setup

vsta.exe /hostid SSIS_ScriptComponent /setup

 

This worked for me.  Hope it helps.

Comments (1) -

4/3/2012 2:23:54 PM #

mario amatucci

Hi I've got this error
when I switch to c# language instead of the default vb, I am pretty sure I need to reference some dll before, can you help me?
thanks, mario

mario amatucci Ireland

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

RecentComments

Comment RSS