• Pandas Types Options
• Passing as Python objects instead of Excel arrays • Using the Pandas type converters outside of a UDF
Pandas DataFrames and Series can be used as function arguments and return types for Excel worksheet functions using the decoratorxl_func.
When used as an argument, the range specified in Excel will be converted into a Pandas DataFrame or Series as specified by the function signature.
When returning a DataFrame or Series, a range of data will be returned to Excel. PyXLL can automatically resize the range of the array formula to match the returned data by settingauto_resize=Trueinxl_func. The following code shows a function that returns a random dataframe, including the index:
from pyxll import xl_func import pandas as pd import numpy as np
@xl_func("int rows, int columns: dataframe<index=True>", auto_resize=True)
def random_dataframe(rows, columns):
data = np.random.rand(rows, columns)
column_names = [chr(ord('A') + x) for x in range(columns)] return pd.DataFrame(data, columns=column_names)
A function can also take a DataFrame or Series as one its arguments. When passing a DataFrame or Series to a function the whole data area must be selected in Excel and used as the argument to the function.
The following function takes a DataFrame including the column headers row, but not including the index column and returns the sum of a single column.:
from pyxll import xl_func
@xl_func("dataframe<index=False, columns=True>, str: float")
def sum_column(df, col_name):
col = df[col_name] return col.sum()
See alsoPandas DataFrame Formatting.
3.12.1 Pandas Types Options
The following options are available for thedataframeandseriesargument and return types: • dataframe, when used as an argument type
dataframe<index=0, columns=1, dtype=None, dtypes=None, index_dtype=None>
index Number of columns to use as the DataFrame’s index. Specifying more than one will result in a DataFrame where the index is a MultiIndex.
columns Number of rows to use as the DataFrame’s columns. Specifying more than one will result in a DataFrame where the columns is a MultiIndex. If used in conjunction withindexthen any column headers on the index columns will be used to name the index.
dtype Datatype for the values in the dataframe. May not be set withdtypes.
dtypes Dictionary of column name -> datatype for the values in the dataframe. May not be set withdtype.
index_dtype Datatype for the values in the dataframe’s index. • dataframe, when used as a return type
dataframe<index=None, columns=True>
index If True include the index when returning to Excel, if False don’t. If None, only include if the index is named.
columns If True include the column headers, if False don’t. • series, when used as an argument type
series<index=1, transpose=None, dtype=None, index_dtype=None>
index Number of columns (or rows, depending on the orientation of the Series) to use as the Series index.
transpose Set to True if the Series is arranged horizontally or False if vertically. By default the orienta- tion will be guessed from the structure of the data.
index_dtype Datatype for the values in the Series’ index. • series, when used as a return type
series<index=True, transpose=False>
index If True include the index when returning to Excel, if False don’t.
transpose Set to True if the Series should be arranged horizontally, or False if vertically.
3.12.2 Passing as Python objects instead of Excel arrays
When passing large DataFrames between Python functions, it is not always necessary to return the full DataFrame to Excel and it can be expensive reconstructing the DataFrame from the Excel range each time. In those cases you can use theobjectreturn type to return a handle to the Python object. Functions taking thedataframeand seriestypes can accept object handles.
The following returns a random DataFrame as a Python object, so will appear in Excel as a single cell with a handle to that object:
from pyxll import xl_func import pandas as pd import numpy as np
@xl_func("int rows, int columns: object")
def random_dataframe(rows, columns):
data = np.random.rand(rows, columns)
column_names = [chr(ord('A') + x) for x in range(columns)] return pd.DataFrame(data, columns=column_names)
The result of a function like this can be passed to another function that expects a DataFrame: @xl_func("dataframe, int: dataframe<index=True>", auto_resize=True)
def dataframe_head(df, num_rows):
return df.head(num_rows)
This allows for large datasets to be used in Excel efficiently, especially where the data set would be cumbersome to deal with in Excel when unpacked.
3.12.3 Using the Pandas type converters outside of a UDF
Sometimes it’s useful to be able to convert a range of data into a DataFrame, or a DataFrame into a range of data for Excel, in a context other than function decorated withxl_func. Or, you might have a function that takes the vartype, which could be a DataFrame depending on other arguments.
In these cases the functionget_type_convertercan be used. For example: from pyxll import get_type_converter
to_dataframe = get_type_converter("var", "dataframe<index=True>") df = to_dataframe(data)
Or the other way:
to_array = get_type_converter("dataframe", "var") data = to_array(df)
All the parameters for thedataframeandseriestypes can be used to control how the conversion is per- formed.