Understand how Continue’s autocomplete works, including timing optimization, context retrieval from your codebase, and filtering to improve AI code suggestions.
In order to display suggestions quickly, without sending too many requests, we do the following:
Debouncing: If you are typing quickly, we won’t make a request on each keystroke. Instead, we wait until you have finished.
Caching: If your cursor is in a position that we’ve already generated a completion for, this completion is reused. For example, if you backspace, we’ll be able to immediately show the suggestion you saw before.
Language models aren’t perfect, but can be made much closer by adjusting their output. We do extensive post-processing on responses before displaying a suggestion, including:
Removing special tokens
Stopping early when regenerating code to avoid long, irrelevant output
Fixing indentation for proper formatting
Occasionally discarding low-quality responses, such as those with excessive repetition