--- title: Detecting all neighbors within range author: - name: Aaron Lun affiliation: Cancer Research UK Cambridge Institute, Cambridge, United Kingdom date: "Revised: 28 September 2018" output: BiocStyle::html_document: toc_float: true package: BiocNeighbors vignette: > %\VignetteIndexEntry{3. Detecting neighbors within range} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: ref.bib --- ```{r, echo=FALSE, results="hide", message=FALSE} require(knitr) opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE) library(BiocNeighbors) ``` # Identifying all neighbors within range Another application of the KMKNN or VP tree algorithms is to identify all neighboring points within a certain distance^[The default here is Euclidean, but again, we can set `distance="Manhattan"` in the `BNPARAM` object if so desired.] of the current point. We first mock up some data: ```{r} nobs <- 10000 ndim <- 20 data <- matrix(runif(nobs*ndim), ncol=ndim) ``` We apply the `findNeighbors()` function to `data`: ```{r} fout <- findNeighbors(data, threshold=1) head(fout$index) head(fout$distance) ``` Each entry of the `index` list corresponds to a point in `data` and contains the row indices in `data` that are within `threshold`. For example, the 3rd point in `data` has the following neighbors: ```{r} fout$index[[3]] ``` ... with the following distances to those neighbors: ```{r} fout$distance[[3]] ``` Note that, for this function, the reported neighbors are _not_ sorted by distance. The order of the output is completely arbitrary and will vary depending on the random seed. However, the identity of the neighbors is fully deterministic. # Querying another data set for neighbors The `queryNeighbors()` function is also provided for identifying all points within a certain distance of a query point. Given a query data set: ```{r} nquery <- 1000 ndim <- 20 query <- matrix(runif(nquery*ndim), ncol=ndim) ``` ... we apply the `queryNeighbors()` function: ```{r} qout <- queryNeighbors(data, query, threshold=1) length(qout$index) ``` ... where each entry of `qout$index` corresponds to a row of `query` and contains its neighbors in `data`. Again, the order of the output is arbitrary but the identity of the neighbors is deterministic. # Further options Most of the options described for `findKNN()` are also applicable here. For example: - `subset` to identify neighbors for a subset of points. - `get.distance` to avoid retrieving distances when unnecessary. - `BPPARAM` to parallelize the calculations across multiple workers. - `raw.index` to return the raw indices from a precomputed index. Note that the argument for a precomputed index is `precomputed`: ```{r} pre <- buildIndex(data, BNPARAM=KmknnParam()) fout.pre <- findNeighbors(BNINDEX=pre, threshold=1) qout.pre <- queryNeighbors(BNINDEX=pre, query=query, threshold=1) ``` Users are referred to the documentation of each function for specific details. # Session information ```{r} sessionInfo() ```