Timing Optimization for Autocomplete

In order to display suggestions quickly, without sending too many requests, we do the following:
  • Debouncing: If you are typing quickly, we won’t make a request on each keystroke. Instead, we wait until you have finished.
  • Caching: If your cursor is in a position that we’ve already generated a completion for, this completion is reused. For example, if you backspace, we’ll be able to immediately show the suggestion you saw before.

Context Retrieval from Your Codebase

Continue uses a number of retrieval methods to find relevant snippets from your codebase to include in the prompt.

Filtering and Post-Processing AI Suggestions

Language models aren’t perfect, but can be made much closer by adjusting their output. We do extensive post-processing on responses before displaying a suggestion, including:
  • Removing special tokens
  • Stopping early when regenerating code to avoid long, irrelevant output
  • Fixing indentation for proper formatting
  • Occasionally discarding low-quality responses, such as those with excessive repetition
You can learn more about how it works in the Autocomplete deep dive.