Title: | Data Science Box of Pandora Miscellaneous |
---|---|
Description: | Tool collection for common and not so common data science use cases. This includes custom made algorithms for data management as well as value calculations that are hard to find elsewhere because of their specificity but would be a waste to get lost nonetheless. Currently available functionality: find sub-graphs in an edge list data.frame, find mode or modes in a vector of values, extract (a) specific regular expression group(s), generate ISO time stamps that play well with file names, or generate URL parameter lists by expanding value combinations. |
Authors: | Peter Meissner [aut, cre] |
Maintainer: | Peter Meissner <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4.0 |
Built: | 2024-11-20 04:04:53 UTC |
Source: | https://github.com/petermeissner/dsmisc |
df_defactorize
df_defactorize(df)
df_defactorize(df)
df |
a data.frame like object |
returns the same data.frame except that factor columns have been transformed into character columns
df <- data.frame( a = 1:2, b = factor(c("a", "b")), c = as.character(letters[3:4]), stringsAsFactors = FALSE ) vapply(df, class, "") df_df <- df_defactorize(df) vapply(df_df, class, "")
df <- data.frame( a = 1:2, b = factor(c("a", "b")), c = as.character(letters[3:4]), stringsAsFactors = FALSE ) vapply(df, class, "") df_df <- df_defactorize(df) vapply(df_df, class, "")
Finding and indexing subgraphs in undirected graph.
graphs_find_subgraphs(id_1, id_2, verbose = 1L)
graphs_find_subgraphs(id_1, id_2, verbose = 1L)
id_1 |
vector of integers indicating ids |
id_2 |
vector of integers indicating ids |
verbose |
in integer indicating the amount of verbosity; good for long running tasks or to get more information about the workings of the algorithm; currently accepted values: 0, 1, 2 |
Input is given as two vectors where each pair of node ids 'id_1[i]' - 'id_2[i]' indicates an edge between two nodes.
An integer vector with subgraph ids such that each distinct subgraph - i.e. all nodes are reachable within the graph and no node outside the subgraph is reachable - gets a distinct integer value. Integer values are assigned via
graphs_find_subgraphs(c(1,2,1,5,6,6), c(2,3,3,4,5,4), verbose = 0) graphs_find_subgraphs(c(1,2,1,5,6,6), c(2,3,3,4,5,4), verbose = 2)
graphs_find_subgraphs(c(1,2,1,5,6,6), c(2,3,3,4,5,4), verbose = 0) graphs_find_subgraphs(c(1,2,1,5,6,6), c(2,3,3,4,5,4), verbose = 2)
Function calculating the mode.
stats_mode(x, multimodal = FALSE, warn = TRUE)
stats_mode(x, multimodal = FALSE, warn = TRUE)
x |
vector to get mode for |
multimodal |
wether or not all modes should be returned in case of more than one |
warn |
should the function warn about multimodal outcomes? |
vector of mode or modes
Function calculating the mode, allowing for multiple modes in case of equal frequencies.
stats_mode_multi(x)
stats_mode_multi(x)
x |
vector to get mode for |
vector with all modes
Extract Regular Expression Groups
str_group_extract(string, pattern, group = NULL, nas = TRUE)
str_group_extract(string, pattern, group = NULL, nas = TRUE)
string |
string to extract from |
pattern |
pattern with groups to match |
group |
groups to extract |
nas |
return NA values (TRUE) or filter them out (FALSE) |
string vector or string matrix
strings <- paste(LETTERS, seq_along(LETTERS), sep = "_") str_group_extract(strings, "([\\w])_(\\d+)") str_group_extract(strings, "([\\w])_(\\d+)", 1) str_group_extract(strings, "([\\w])_(\\d+)", 2)
strings <- paste(LETTERS, seq_along(LETTERS), sep = "_") str_group_extract(strings, "([\\w])_(\\d+)") str_group_extract(strings, "([\\w])_(\\d+)", 1) str_group_extract(strings, "([\\w])_(\\d+)", 2)
Generating file name ready iso time stamps.
time_stamp(ts = Sys.time(), sep = c("-", "_", "_"))
time_stamp(ts = Sys.time(), sep = c("-", "_", "_"))
ts |
one or more POSIX time stamp |
sep |
separators to be used for formatting |
Returns timestamp string in format yyyy-mm-dd_HH_MM_SS ready to be used safely in file names on various operating systems.
time_stamp() time_stamp( Sys.time() - 10000 )
time_stamp() time_stamp( Sys.time() - 10000 )
Fit number into length of index
tool_i_fit_index(i, index)
tool_i_fit_index(i, index)
i |
number to make sure to fit into index range |
index |
size of index |
number that fits into index by circularly mapping i to range of 1 to size of index
tool_i_fit_index(-2:6, 3)
tool_i_fit_index(-2:6, 3)
Subset object even if index is out of range
tool_i_fit_obj(i, obj)
tool_i_fit_obj(i, obj)
i |
number to make sure to fit into index range |
obj |
object to subset data from |
elements of object circularly indexed by mapping i to range of 1 to size of object
tool_i_fit_obj(-2:6, 3)
tool_i_fit_obj(-2:6, 3)
Generate URL parameter combinations from sets of parameter values.
web_gen_param_list_expand(..., sep_1 = "=", sep_2 = "&")
web_gen_param_list_expand(..., sep_1 = "=", sep_2 = "&")
... |
multiple vectors passed on as named arguments or a single list or a data.frame |
sep_1 |
first separator to use between key and value |
sep_2 |
second separator to use between key-value pairs |
string vector with assembled query string parameter combinations
web_gen_param_list_expand(q = "beluga", lang = c("de", "en"))
web_gen_param_list_expand(q = "beluga", lang = c("de", "en"))