Wednesday, September 16, 2009

Reading Excel files in C#

When importing any kind of data into an application simply using a text file is sometimes not sufficient, especially if the users already have information in an Excel file.

Using an Excel file in C# is like connecting to a database. You provide a connection string (which, among other things, include the file path) and use SQL statements to read the file. The following snippet of code connects an Excel file to a DataGridView form component.

OleDbConnection connection = new OleDbConnection(Provider=Microsoft.Jet.OLEDB.4.0;Data Source=foo.xls;Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\";);


try
{
  connection.Open();


  OleDbCommand cmd = connection.CreateCommand();
  cmd.CommandText = "SELECT * FROM [Sheet1$]";


  DataSet dataSet = new DataSet();
  OleDbDataAdapter dataAdapter = new OleDbDataAdapter(cmd);


  dataAdapter.Fill(dataSet, "ImportData");
  dataGridView_Sheet1.DataSource = dataSet.Tables["ImportData"].DefaultView;
}

catch { }
finally
{
  if (connection != null)
  {
    connection.Close();
    connection.Dispose();
  }
}

The SQL-statement includes the name of the desired sheet in the Excel file. Do note that the sheet name is fully editable by the user, and the default name is language dependant. Reading all sheets in an Excel file is fairly simple:

public string[] GetExcelSheetNames()
{
  OleDbConnection objConn = null;
  System.Data.DataTable dt = null;

  try
  {
    objConn = new OleDbConnection(ExcelConnectionString);
    objConn.Open();
  
    // Get the data table containg the schema guid.
    dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
  
    if (dt == null)
    {
      return null;
    }
  
    string[] excelSheets = new String[dt.Rows.Count];
    int i = 0;
  
    foreach (System.Data.DataRow row in dt.Rows)
    {
      excelSheets[i] = row["TABLE_NAME"].ToString();
      i++;
    }
  
    return excelSheets;
  }
  catch
  {
    return null;
  }
  finally
  {
    // Clean up.
    if (objConn != null)
    {
      objConn.Close();
      objConn.Dispose();
    }
    if (dt != null)
    {
      dt.Dispose();
    }
  }
}



The OleDbConnection object takes a connection string as parameter in the constructor. The format of the parameter can found at http://www.connectionstrings.com/. The connection string in the example above is suitable for Excel files with the xls file extension. The correct connection string for Excel 2007 files (with e.g. the xlsx file extension can be found at the Connection String web site.

Two important things are worth mentioning about the connection string. The HDR parameter indicates that the first row in the Excel file shall be treated as column headers. If the first row is not used to describe the data, simply set HDR=No. The IMEX parameter makes all columns be treated as strings, if the parameter is set to No then is the column’s data type based on the first row with data (which is the second row in the file if HDR is set to Yes). If a column in the file has mixed data with integers and characters then should this parameter be set to Yes. This little parameter has caused me a lot of problems…

If the first row in the data has only numeric then is the data type set to e.g. double. But no exception is thrown when a row’s column has a mixture of numeric and characters, the data is instead set to null.

No comments:

Post a Comment