| Port variant | std |
| Summary | Fast tokenization of natural language text |
| Package version | 0.3.0 |
| Homepage | https://docs.ropensci.org/tokenizers/ |
| Keywords | cran |
| Maintainer | CRAN Automaton |
| License | Not yet specified |
| Other variants | There are no other variants. |
| Ravenports | Buildsheet | History |
| Ravensource | Port Directory | History |
| Last modified | 09 AUG 2024, 21:24:17 UTC |
| Port created | 15 APR 2020, 06:14:40 UTC |
| single | tokenizers: Fast, Consistent Tokenization of Natural Language Text Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'. |
| Build (only) |
gmake:primary:std R:primary:std icu:dev:std |
| Build and Runtime |
R-stringi:single:std R-Rcpp:single:std R-SnowballC:single:std |
| Runtime (only) |
R:primary:std R:nls:std |
| main | mirror://CRAN/src/contrib https://loki.dragonflybsd.org/cranfiles/ |
| R-tidytext:std | Text mining tool |