# Module core.steps.split.random_split¶

Implementation of a random split of the input data set.

## Functions¶

`RandomSplitPartitionFn(element: Any, num_partitions: int, split_map: Dict[str, float]) ‑> int` : Function for a random split of the data; to be used in a beam.Partition. This function implements a simple random split algorithm by drawing integers from a categorical distribution defined by the values in split_map.

```Args:
element: Data point, in format tf.train.Example.
num_partitions: Number of splits, unused here.
split_map: Dict mapping {split_name: percentage of data in split}.

Returns:
An integer n, where 0 ≤ n ≤ num_partitions - 1.
```

`lint_split_map(split_map: Dict[str, float])` : Small utility to lint the split_map

## Classes¶

`RandomSplit(split_map: Dict[str, float], statistics=None, schema=None)` : Random split. Use this to randomly split data based on a cumulative distribution function defined by a split_map dict.

```Random split constructor.

Randomly split the data based on a cumulative distribution function
defined by split_map.

Example usage:

# Split data randomly, but evenly into train, eval and test

>>> split = RandomSplit(
... split_map = {"train": 0.334,
...              "eval": 0.333,
...              "test": 0.333})

Here, each data split gets assigned about one third of the probability
mass. The split is carried out by sampling from the categorical
distribution defined by the values p_i in the split map, i.e.

P(index = i) = p_i, i = 1,...,n ;

where n is the number of splits defined in the split map. Hence, the
values in the split map must sum up to 1. For more information, see
https://en.wikipedia.org/wiki/Categorical_distribution.

Args:
statistics: Parsed statistics from a preceding StatisticsGen.
schema: Parsed schema from a preceding SchemaGen.
split_map: A dict { split_name: percentage of data in split }.

### Ancestors (in MRO)

* zenml.core.steps.split.base_split_step.BaseSplit
* zenml.core.steps.base_step.BaseStep

### Methods

`get_split_names(self) ‑> List[str]`
:   Returns the names of the splits associated with this split step.

Returns:
A list of strings, which are the split names.

`partition_fn(self)`
:   Returns the partition function associated with the current split type,
along with keyword arguments used in the signature of the partition
function.

To be eligible in use in a Split Step, the partition_fn has to adhere
to the following design contract:

1. The signature is of the following type:

>>> def partition_fn(element, n, **kwargs) -> int,

where n is the number of splits;
2. The partition_fn only returns signed integers i less than n, i.e. ::

0 ≤ i ≤ n - 1.

Returns:
A tuple (partition_fn, kwargs) of the partition function and its