data¶

data.data_preprocessing module¶

federated.data.data_preprocessing.create_class_distributed_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function distributes the data in a way such that each client gets one type of data.

Args:

X (np.ndarray): Input.

y (np.ndarray): Output.

number_of_clients (int): Number of clients.

Returns:

[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.

federated.data.data_preprocessing.create_corrupted_non_iid_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function distributes the data such that each client has non-iid data except client 1, which only has values in the interval [20, 40].

Args:

X (np.ndarray): Input.

y (np.ndarray): Output.

number_of_clients (int): Number of clients.

Returns:

[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.

federated.data.data_preprocessing.create_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[None, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function converts pandas dataframe to tensorflow federated dataset.

Args:

X (np.ndarray): Inputs.

y (np.ndarray): Outputs.

number_of_clients (int): The number of clients to split the data between.

Returns:

[None, tff.simulation.ClientData]: Returns federated data distribution.

federated.data.data_preprocessing.create_non_iid_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function distributes the data such that each client has non-iid data.

Args:

X (np.ndarray): Input.

y (np.ndarray): Output.

number_of_clients (int): Number of clients.

Returns:

[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.

federated.data.data_preprocessing.create_tff_dataset(clients_data: Dict) → tensorflow_federated.python.simulation.client_data.ClientData¶

Function converts dictionary to tensorflow federated dataset.

Args:: clients_data (Dict): Inputs.
Returns:: tff.simulation.ClientData: Returns federated data distribution.

federated.data.data_preprocessing.create_unbalanced_data(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function distributes the data in such a way that one client only has one type of data, while the rest of the clients has non-iid data.

Args:

X (np.ndarray): Input.

y (np.ndarray): Output.

number_of_clients (int): Number of clients.

Returns:

[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.

federated.data.data_preprocessing.create_uniform_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶

Function distributes the data equally such that each client holds equal amounts of each class.

Args:

X (np.ndarray): Input.

y (np.ndarray): Output.

number_of_clients (int): Number of clients.

Returns:

[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.

federated.data.data_preprocessing.get_datasets(train_batch_size: int = 32, test_batch_size: int = 32, train_shuffle_buffer_size: int = 10000, test_shuffle_buffer_size: int = 10000, train_epochs: int = 5, test_epochs: int = 5, centralized: bool = False, normalized: bool = True, data_selector: Callable[[numpy.ndarray, numpy.ndarray, int], tensorflow_federated.python.simulation.client_data.ClientData] = <function create_dataset>, number_of_clients: int = 5) → Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2, int]¶

Function preprocesses datasets. Return input-ready datasets

Args:

train_batch_size (int, optional): Training batch size. Defaults to 32.

test_batch_size (int, optional): Test batch size. Defaults to 32.

train_shuffle_buffer_size (int, optional): Training shuffle buffer size. Defaults to 10000.

test_shuffle_buffer_size (int, optional): Testing shuffle buffer size. Defaults to 10000.

train_epochs (int, optional): Training epochs. Defaults to 5.

test_epochs (int, optional): Test epochs. Defaults to 5.

centralized (bool, optional): Whether to create dataset for centralized learning. Defaults to False.

normalized (bool, optional): If the data should be normalized. Defaults to True.

data_selector (Callable[ [np.ndarray, np.ndarray, int], tff.simulation.ClientData ], optional): Which data distribution to use. Defaults to create_dataset.

number_of_clients (int, optional): Number of clients. Defaults to 5.

Returns:

[tf.data.Dataset, tf.data.Dataset, int]: Input-ready datasets, and number of datapoints.

federated.data.data_preprocessing.load_data(normalized: bool = True, data_analysis: bool = False, data_selector: Optional[Callable[[numpy.ndarray, numpy.ndarray, int], tensorflow_federated.python.simulation.client_data.ClientData]] = None, number_of_clients: int = 5) → Tuple[tensorflow_federated.python.simulation.client_data.ClientData, tensorflow_federated.python.simulation.client_data.ClientData, int]¶

Function loads data from csv-file and preprocesses the training and test data seperately.

Args:

normalized (bool, optional): Whether to normalize the data. Defaults to True.

data_analysis (bool, optional): Load data for data analysis. Defaults to False.

data_selector (Callable[ [np.ndarray, np.ndarray, int], tff.simulation.ClientData ], optional): Data distribution. Defaults to None.

number_of_clients (int, optional): Number of clients. Defaults to 5.

Raises:

ValueError: The data has to be normalized to use create_uniform_dataset.

Returns:

[tff.simulation.ClientData, tff.simulation.ClientData, int]: A tuple of tff.simulation.ClientData.

federated.data.data_preprocessing.preprocess_dataset(epochs: int, batch_size: int, shuffle_buffer_size: int) → Callable[[tensorflow.python.data.ops.dataset_ops.DatasetV2], tensorflow.python.data.ops.dataset_ops.DatasetV2]¶

Function returns a function for preprocessing of a dataset.

Args:

epochs (int): How many times to repeat a batch.

batch_size (int): Batch size.

shuffle_buffer_size (int): Buffer size for shuffling the dataset.

Returns:

Callable[[tf.data.Dataset], tf.data.Dataset]: A callable for preprocessing a dataset object.

federated.data.data_preprocessing.split_dataframe(df)¶: Function for splitting dataframe into (input, output) pairs.