# Data underlying the publication: Learning the mechanisms of network growth
For elaborate documentation on the simulation and classes of networks, see the corresponding publication.
## Data source
This dataset contains 6733 simulated dynamic networks following 9 different growth mechanisms.
The simulation, written in Python, consisted of a continuous time branching process, which generated a tree that was then collapsed to a network.
Code repository for network simulation and feature extraction: https://gitfront.io/r/user-6239985/R9WcT8Msr46T/DynamicNetworkSimulation/
### DatasetNetworkSource.zip
contains 6733 networks of size 20.000, each identified by class-id. Code. So 7-606 is class 7, id 606. Each network has two .csv files, namely `[class-id]_nodes.csv` and `[class-id]_edges.csv` one with the node and arrival time and one with the edgepairs.
Thee are 8 classes
| **Code** | **Name** | **Growth Mechanism** | **Parameters** | **Range** |
|----------|----------|-----------------------------------------------|---------------------------------------|---------------------------------------------------------------|
| 0 | U | Uniform Attachment | None | - |
| 1 | P | Affine PA | $a,b$ | $a \in (1, 4), b \in (1, 4)$ |
| 2 | Fpl | Power-law Fitness | $(x_{\rm min}, au)$ | $x_{\rm min} \in (0.5, 1), \tau \in (2, 4)$ |
| 3 | Fexp | Exponential Fitness | $\lambda$ | $\lambda$ |
| 4 | FplA | Power-law Fitness, Lognormal Aging | $(x_{\rm min}, \tau)$ $(\mu, \sigma)$ | $x_{\rm min} \in (0.5, 1), \tau \in (2, 2.7)$ $\mu |
| 5 | FexpA | Exponential Fitness, Lognormal Aging | $\lambda$ $(\mu, \sigma)$ | $\lambda$ |
| 6 | FunifP | Affine PA, Uniform Fitness | $(a,b)$ $(c,d)$ | $a \in (1, 4), b \in (1, 4), c \in (0.1, 1), d \in (1, 5)$ |
| 7 | AP | Affine PA, Lognormal Aging | $(a,b)$ $(\mu, \sigma)$ | $a \in (3.3, 7), b \in (1, 4)$, $\mu \in (0.1, 3), \sigma = 1$ |
| 8 | FexpAP | Affine PA, Exponential Fitness, Lognormal Aging | $(a,b)$ $\lambda$ $(\mu, \sigma)$ | $a \in (1, 4), b \in (1, 4)$, $\lambda \in (0.1, a + \frac{b}{E[M]})$ |
## results.csv
This file contains parameters choices and the results of simulations for each of the networks in DatasetNetworkSource. This file also contains parameter choices, which lead to repeated failed simulation (i.e. branching process dying out before reaching the size of 20.000 nodes)
Name,Code,Network Size,PA,Fitness,Aging,Result
| Name | Code | Network Size | PA | Fitness | Aging | Result |
|---|---|---|---|---|---|---|
| class-id | class | Size of generated network | Preferential attachment parameter choice | Fitness parameter choice | Aging parameter choice | How many tries of the birthprocess simulation was needed before the size was reached. |
## validation_metrics.csv
This file contains a variety of network statistics calculated for the networks with the purpose to check that the simulation went well:
class, index, network_age, network_density, number_of_edges, mean_outdegree, std_outdegree, min_outdegree, 25%_outdegree, 50%_outdegree, 75%_outdegree, max_outdegree
## StaticFeatures.csv
Table with 36 static features (i.e. features not using the arrival time of the vertices) for all networks in DatasetNetworkSource.
The static features are:
* The assortativity coefficient (the Pearson's correlation coefficient between degrees of two vertices connected by an edge),
* the global clustering coefficient (or, transitivity, is the fraction of triangles among connected triplets of vertices).
For each of the following node-features we compute the minimum, the maximum, the mean, the standard deviation, and 3 to 5 quantiles:
* degrees (quantiles 0.125, .25, .5, .75, .875),
* coreness (quantiles .25, .5, .75; the coreness of a vertex equals the largest number $k$ such that the vertex is in the $k$-core -- the maximal induced subgraph with all vertices having degree $k$ or higher),
* the number of triangles that contain the vertex (quantiles .80, .90, .95, .97, .99, which have sufficient variance to be used as features),
* the local clustering coefficient (quantiles .5, .6, .7, .8, .9; the local clustering coefficient is the fraction of pairs of neighbors of the vertex that are connected to each other.)
## AlternativeDynamicFeatures/
Contains the 10x10 feature matrices with time-cohorts for each network in DataNetworkSource.
The filename is specified as follows: `[class-id]_matrix.csv`
For the definition of the feature matrix, see the corresponding publication.
## DynamicFeatures/
Contains the 10x10 feature matrix with size-cohorts for each network in DatasetNetworkSource.
The filename is specified as follows: `[class-id]_matrix.csv`
For the definition of the feature matrix, see the corresponding publication.