Ravenport: R-tokenizers

R-tokenizers

Port variant	std
Summary	Fast tokenization of natural language text
Package version	0.3.0
Homepage	https://docs.ropensci.org/tokenizers/
Keywords	cran
Maintainer	CRAN Automaton
License	Not yet specified
Other variants	There are no other variants.
Ravenports	Buildsheet \| History
Ravensource	Port Directory \| History
Last modified	09 AUG 2024, 21:24:17 UTC
Port created	15 APR 2020, 06:14:40 UTC

Subpackage Descriptions

single

tokenizers: Fast, Consistent Tokenization of Natural Language Text Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

Configuration Switches (platform-specific settings discarded)

This port has no build options.

Package Dependencies by Type

Build (only)	gmake:primary:std R:primary:std icu:dev:std
Build and Runtime	R-stringi:single:std R-Rcpp:single:std R-SnowballC:single:std
Runtime (only)	R:primary:std R:nls:std

Download groups

main	mirror://CRAN/src/contrib https://loki.dragonflybsd.org/cranfiles/

Distribution File Information

24571e4642a1a2d9f4f4c7a363b514eece74788d59c09012a5190ee718a91c29 458876 CRAN/tokenizers_0.3.0.tar.gz

Ports that require R-tokenizers:std

R-tidytext:std

Text mining tool