This tests an alternative to Erwin Kalvelagen’s script for generating a sparse random digraph.

We start by loading the magrittr library.

library(magrittr)

Erwin’s script

Now we time Erwin’s script (excluding output). Since we are not writing out the results, the removal of duplicates is done at the data frame rather than in the output.

# Set the number of nodes.
n <- 5000
# Set the desired number of arcs.
nn <- n^2 / 100
# Use a fixed random number seed for reproducibility (hopefully).
set.seed(123)
start.time <- proc.time()
# Generate a random collection of arcs in a data frame.
df <- data.frame(
ni = sample(n, nn, replace=TRUE),
nj = sample(n, nn, replace=TRUE))
# Sort the arcs.
df <- df[order(df$ni, df$nj),]
# Remove duplicates.
df <- unique(df)
# Show the time consumed.
proc.time() - start.time
   user  system elapsed
0.294   0.023   0.318 

How many arcs did we actually get (compared to how many we wanted)?

cat(paste0("Generated ", nrow(df), " of desired ", nn, " arcs."))
Generated 248744 of desired 250000 arcs.

How many loops (arcs with head = tail) did we get?

cat(sum(df[,1] == df[,2]))
48

How many nodes are isolated (no arcs in or out)?

tails <- unique(df$nj) heads <- unique(df$ni)
orphans <- setdiff(1:n, union(tails, heads))
cat(paste0(length(tails), " of ", n , " nodes have indegree >= 1.\n"))
5000 of 5000 nodes have indegree >= 1.
cat(paste0(length(heads), " of ", n, " nodes have outdegree >= 1.\n"))
5000 of 5000 nodes have outdegree >= 1.
cat(paste0(length(orphans), " nodes are orphans."))
0 nodes are orphans.

My script

For the alternative approach, we initially use zero-based indexing for nodes and arcs. The index of arc (i, j) is i*n + j. We will need the inverse of that function (i.e., a function to convert index k back to the indices of the tail and head nodes).

toArc <- function(k) {
c(k %/% n, k %% n)
}

We will also want a function to drop selected indices of a vector within a pipe. To generate the arcs, we start with the arc indices 0, …, n^2-1 and remove indices corresponding to loops. Arc k is a loop if k %/% n == k %% n, or equivalently if k = m * (n + 1) for some m in {0, …, n - 1}. We then map the surviving indices to arcs.

# Use a fixed random number seed for reproducibility (hopefully).
set.seed(123)
# Generate the data frame of arcs.
start.time <- proc.time()
df2 <- seq.int(0, n^2 - 1, 1) %>%                  # start with all arc indices
extract(-seq.int(1, n^2, n + 1)) %>%      # weed out loops
sample(size = nn, replace = FALSE) %>%    # take a random subset
sort() %>%                                # sort into index order
sapply(toArc) %>%                         # convert to arcs
t() %>%                                   # transpose
as.data.frame() %>%                       # make a dataframe
add(1)                                    # revert to 1-based indexing
# Set the column names.
colnames(df2) <- c("ni", "nj")
# Show the time consumed.
proc.time() - start.time
   user  system elapsed
0.730   0.236   0.964 

How many arcs did we actually get (compared to how many we wanted)?

cat(paste0("Generated ", nrow(df2), " of desired ", nn, " arcs."))
Generated 250000 of desired 250000 arcs.

How many loops (arcs with head = tail) did we get?

cat(sum(df2[,1] == df2[,2]))
0

How many nodes are isolated (no arcs in or out)?

tails <- unique(df2$nj) heads <- unique(df2$ni)
orphans <- setdiff(1:n, union(tails, heads))
cat(paste0(length(tails), " of ", n , " nodes have indegree >= 1.\n"))
5000 of 5000 nodes have indegree >= 1.
cat(paste0(length(heads), " of ", n, " nodes have outdegree >= 1.\n"))
5000 of 5000 nodes have outdegree >= 1.
cat(paste0(length(orphans), " nodes are orphans."))
0 nodes are orphans.