import bisect
import sys
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]
ROW_FMT = '{0:2d} @ {1:2d} {2}{0:<2d}'
def demo(bisect_fn):
for needle in reversed(NEEDLES):
position = bisect_fn(HAYSTACK, needle) # (1)
offset = position * ' |' # (2)
print(ROW_FMT.format(needle, position, offset)) # (3)
if __name__ == '__main__':
if sys.argv[-1] == 'left': # (4)
bisect_fn = bisect.bisect_left
else:
bisect_fn = bisect.bisect
print('DEMO:', bisect_fn.__name__) # (5)
print('haystack ->', ' '.join(f'{n:2}' for n in HAYSTACK))
demo(bisect_fn)
Managing Ordered Sequences with Bisect
The bisect module offers two main functions that use the binary search algorithm to quickly find and insert items in any sorted sequence.
Searching with bisect
bisect(haystack, needle) does a binary search for needle in haystack—which must be a sorted sequence—to locate the position where needle can be inserted while maintaining haystack in ascending order.
In other words, all items appearing up to that position are less than or equal to needle.
You could use the result of bisect(haystack, needle) as the index argument to haystack.insert(index, needle)—however, using insort does both steps, and is faster.
|
Tip
|
Raymond Hettinger wrote a |
Example 1 uses a carefully chosen set of "needles" to demonstrate the insert positions returned by bisect.
Its output is in Figure 1.
-
Use the chosen
bisectfunction to get the insertion point. -
Build a pattern of vertical bars proportional to the
offset. -
Print formatted row showing needle and insertion point.
-
Choose the
bisectfunction to use according to the last command-line argument. -
Print header with name of function selected.
The behavior of bisect can be fine-tuned in two ways.
First, a pair of optional arguments, lo and hi, allow narrowing the region in the sequence to be searched when inserting. lo defaults to 0 and hi to the len() of the sequence.
Second, bisect is actually an alias for bisect_right, and there is a sister function called bisect_left.
Their difference is apparent only when the needle compares equal to an item in the list: bisect_right returns an insertion point after the existing item, and bisect_left returns the position of the existing item, so insertion would occur before it.
With simple types like int, inserting before or after makes no difference,
but if the sequence contains objects that are distinct yet compare equal, then it may be relevant.
For example, 1 and 1.0 are distinct, but 1 == 1.0 is True. Figure 2 shows the result of using bisect_left.
An interesting application of bisect is to perform table lookups by numeric values—for example,
to convert test scores to letter grades, as in Example 2.
>>> breakpoints = [60, 70, 80, 90]
>>> grades='FDCBA'
>>> def grade(score):
... i = bisect.bisect(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [55, 60, 65, 70, 75, 80, 85, 90, 95]]
['F', 'D', 'D', 'C', 'C', 'B', 'B', 'A', 'A']
The code in Example 2 is from the bisect module documentation, which also lists functions to use bisect as a faster replacement for the index method when searching through long ordered sequences of numbers.
When used for table lookups, bisect_left produces very different results[1].
Note the letter grade results in Example 3.
bisect_left maps a score of 60 to grade 'F', not 'D' as in Example 2.>>> breakpoints = [60, 70, 80, 90]
>>> grades='FDCBA'
>>> def grade(score):
... i = bisect.bisect_left(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [55, 60, 65, 70, 75, 80, 85, 90, 95]]
['F', 'F', 'D', 'D', 'C', 'C', 'B', 'B', 'A']
These functions are not only used for searching, but also for inserting items in sorted sequences, as the following section shows.
Inserting with insort
Sorting is expensive, so once you have a sorted sequence, it’s good to keep it that way.
That is why bisect.insort was created.
insort(seq, item) inserts item into seq so as to keep seq in ascending order.
See Example 4 and its output in Figure 3.
import bisect
import random
SIZE = 7
random.seed(1729)
my_list = []
for i in range(SIZE):
new_item = random.randrange(SIZE * 2)
bisect.insort(my_list, new_item)
print(f'{new_item:2d} -> {my_list}')
Like bisect, insort takes optional lo, hi arguments to limit the search to a sub-sequence.
There is also an insort_left variation that uses bisect_left to find insertion points.