data¶
data.data_preprocessing module¶
- federated.data.data_preprocessing.create_class_distributed_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function distributes the data in a way such that each client gets one type of data.
- Args:
X (np.ndarray): Input.
y (np.ndarray): Output.
number_of_clients (int): Number of clients.
- Returns:
[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.
- federated.data.data_preprocessing.create_corrupted_non_iid_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function distributes the data such that each client has non-iid data except client 1, which only has values in the interval [20, 40].
- Args:
X (np.ndarray): Input.
y (np.ndarray): Output.
number_of_clients (int): Number of clients.
- Returns:
[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.
- federated.data.data_preprocessing.create_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[None, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function converts pandas dataframe to tensorflow federated dataset.
- Args:
X (np.ndarray): Inputs.
y (np.ndarray): Outputs.
number_of_clients (int): The number of clients to split the data between.
- Returns:
[None, tff.simulation.ClientData]: Returns federated data distribution.
- federated.data.data_preprocessing.create_non_iid_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function distributes the data such that each client has non-iid data.
- Args:
X (np.ndarray): Input.
y (np.ndarray): Output.
number_of_clients (int): Number of clients.
- Returns:
[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.
- federated.data.data_preprocessing.create_tff_dataset(clients_data: Dict) → tensorflow_federated.python.simulation.client_data.ClientData¶
Function converts dictionary to tensorflow federated dataset.
- Args:
clients_data (Dict): Inputs.
- Returns:
tff.simulation.ClientData: Returns federated data distribution.
- federated.data.data_preprocessing.create_unbalanced_data(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function distributes the data in such a way that one client only has one type of data, while the rest of the clients has non-iid data.
- Args:
X (np.ndarray): Input.
y (np.ndarray): Output.
number_of_clients (int): Number of clients.
- Returns:
[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.
- federated.data.data_preprocessing.create_uniform_dataset(X: numpy.ndarray, y: numpy.ndarray, number_of_clients: int) → Tuple[Dict, tensorflow_federated.python.simulation.client_data.ClientData]¶
Function distributes the data equally such that each client holds equal amounts of each class.
- Args:
X (np.ndarray): Input.
y (np.ndarray): Output.
number_of_clients (int): Number of clients.
- Returns:
[Dict, tff.simulation.ClientData]: A dictionary and a tensorflow federated dataset containing the distributed dataset.
- federated.data.data_preprocessing.get_datasets(train_batch_size: int = 32, test_batch_size: int = 32, train_shuffle_buffer_size: int = 10000, test_shuffle_buffer_size: int = 10000, train_epochs: int = 5, test_epochs: int = 5, centralized: bool = False, normalized: bool = True, data_selector: Callable[[numpy.ndarray, numpy.ndarray, int], tensorflow_federated.python.simulation.client_data.ClientData] = <function create_dataset>, number_of_clients: int = 5) → Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2, int]¶
Function preprocesses datasets. Return input-ready datasets
- Args:
train_batch_size (int, optional): Training batch size. Defaults to 32.
test_batch_size (int, optional): Test batch size. Defaults to 32.
train_shuffle_buffer_size (int, optional): Training shuffle buffer size. Defaults to 10000.
test_shuffle_buffer_size (int, optional): Testing shuffle buffer size. Defaults to 10000.
train_epochs (int, optional): Training epochs. Defaults to 5.
test_epochs (int, optional): Test epochs. Defaults to 5.
centralized (bool, optional): Whether to create dataset for centralized learning. Defaults to False.
normalized (bool, optional): If the data should be normalized. Defaults to True.
data_selector (Callable[ [np.ndarray, np.ndarray, int], tff.simulation.ClientData ], optional): Which data distribution to use. Defaults to create_dataset.
number_of_clients (int, optional): Number of clients. Defaults to 5.
- Returns:
[tf.data.Dataset, tf.data.Dataset, int]: Input-ready datasets, and number of datapoints.
- federated.data.data_preprocessing.load_data(normalized: bool = True, data_analysis: bool = False, data_selector: Optional[Callable[[numpy.ndarray, numpy.ndarray, int], tensorflow_federated.python.simulation.client_data.ClientData]] = None, number_of_clients: int = 5) → Tuple[tensorflow_federated.python.simulation.client_data.ClientData, tensorflow_federated.python.simulation.client_data.ClientData, int]¶
Function loads data from csv-file and preprocesses the training and test data seperately.
- Args:
normalized (bool, optional): Whether to normalize the data. Defaults to True.
data_analysis (bool, optional): Load data for data analysis. Defaults to False.
data_selector (Callable[ [np.ndarray, np.ndarray, int], tff.simulation.ClientData ], optional): Data distribution. Defaults to None.
number_of_clients (int, optional): Number of clients. Defaults to 5.
- Raises:
ValueError: The data has to be normalized to use create_uniform_dataset.
- Returns:
[tff.simulation.ClientData, tff.simulation.ClientData, int]: A tuple of tff.simulation.ClientData.
- federated.data.data_preprocessing.preprocess_dataset(epochs: int, batch_size: int, shuffle_buffer_size: int) → Callable[[tensorflow.python.data.ops.dataset_ops.DatasetV2], tensorflow.python.data.ops.dataset_ops.DatasetV2]¶
Function returns a function for preprocessing of a dataset.
- Args:
epochs (int): How many times to repeat a batch.
batch_size (int): Batch size.
shuffle_buffer_size (int): Buffer size for shuffling the dataset.
- Returns:
Callable[[tf.data.Dataset], tf.data.Dataset]: A callable for preprocessing a dataset object.
- federated.data.data_preprocessing.split_dataframe(df)¶
Function for splitting dataframe into (input, output) pairs.