How to test if a list contains another list as a contiguous subsequence?

How can I test if a list contains another list (ie. it’s a contiguous subsequence). Say there was a function called contains:

contains([1,2], [-1, 0, 1, 2]) # Returns [2, 3] (contains returns [start, end])
contains([1,3], [-1, 0, 1, 2]) # Returns False
contains([1, 2], [[1, 2], 3]) # Returns False
contains([[1, 2]], [[1, 2], 3]) # Returns [0, 0]

Edit:

contains([2, 1], [-1, 0, 1, 2]) # Returns False
contains([-1, 1, 2], [-1, 0, 1, 2]) # Returns False
contains([0, 1, 2], [-1, 0, 1, 2]) # Returns [1, 3]

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Used methods are:

Code for Time analysis (averaging over 1000 iterations):

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Method 10

Method 11

Method 12

Method 13

Method 14

Method 15

Method 16

Method 17

Method 18

Method 19

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If all items are unique, you can use sets.

>>> items = set([-1, 0, 1, 2])
>>> set([1, 2]).issubset(items)
True
>>> set([1, 3]).issubset(items)
False

Method 2

There’s an all() and any() function to do this.
To check if big contains ALL elements in small

result = all(elem in big for elem in small)

To check if small contains ANY elements in big

result = any(elem in big for elem in small)

the variable result would be boolean (TRUE/FALSE).

Method 3

Here is my version:

def contains(small, big):
    for i in xrange(len(big)-len(small)+1):
        for j in xrange(len(small)):
            if big[i+j] != small[j]:
                break
        else:
            return i, i+len(small)
    return False

It returns a tuple of (start, end+1) since I think that is more pythonic, as Andrew Jaffe points out in his comment. It does not slice any sublists so should be reasonably efficient.

One point of interest for newbies is that it uses the else clause on the for statement – this is not something I use very often but can be invaluable in situations like this.

This is identical to finding substrings in a string, so for large lists it may be more efficient to implement something like the Boyer-Moore algorithm.

Note: If you are using Python3, change xrange to range.

Method 4

May I humbly suggest the Rabin-Karp algorithm if the big list is really big. The link even contains almost-usable code in almost-Python.

Method 5

This works and is fairly fast since it does the linear searching using the builtin list.index() method and == operator:

def contains(sub, pri):
    M, N = len(pri), len(sub)
    i, LAST = 0, M-N+1
    while True:
        try:
            found = pri.index(sub[0], i, LAST) # find first elem in sub
        except ValueError:
            return False
        if pri[found:found+N] == sub:
            return [found, found+N-1]
        else:
            i = found+1

Method 6

After OP’s edit:

def contains(small, big):
    for i in xrange(1 + len(big) - len(small)):
        if small == big[i:i+len(small)]:
            return i, i + len(small) - 1
    return False

Method 7

If we refine the problem talking about testing if a list contains another list with as a sequence, the answer could be the next one-liner:

def contains(subseq, inseq):
    return any(inseq[pos:pos + len(subseq)] == subseq for pos in range(0, len(inseq) - len(subseq) + 1))

Here unit tests I used to tune up this one-liner:

https://gist.github.com/anonymous/6910a85b4978daee137f

Method 8

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([0,5,4]))

It provides true output.

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([1,3]))

It provides false output.

Method 9

I’ve Summarized and evaluated Time taken by different techniques

Used methods are:

def containsUsingStr(sequence, element:list):
    return str(element)[1:-1] in str(sequence)[1:-1]


def containsUsingIndexing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        for j in range(lE):
            if sequence[i+j] != element[j]: break
        else: return True
    return False


def containsUsingSlicing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        if sequence[i : i+lE] == element: return True
    return False


def containsUsingAny(sequence:list, element:list):
    lE = len(element)
    return any(element == sequence[i:i+lE] for i in range(len(sequence)-lE+1))

Code for Time analysis (averaging over 1000 iterations):

from time import perf_counter

functions = (containsUsingStr, containsUsingIndexing, containsUsingSlicing, containsUsingAny)
fCount = len(functions)


for func in functions:
    print(str.ljust(f'Function : {func.__name__}', 32), end='   ::   Return Values: ')
    print(func([1,2,3,4,5,5], [3,4,5,5]) , end=', ')
    print(func([1,2,3,4,5,5], [1,3,4,5]))



avg_times = [0]*fCount
for _ in range(1000):
    perf_times = []
    for func in functions:
        startTime = perf_counter()
        func([1,2,3,4,5,5], [3,4,5,5])
        timeTaken = perf_counter()-startTime
        perf_times.append(timeTaken)
        

    for t in range(fCount): avg_times[t] += perf_times[t]

minTime = min(avg_times)
print("nn Ratio of Time of Executions : ", ' : '.join(map(lambda x: str(round(x/minTime, 4)), avg_times)))

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Method 10

Here’s a straightforward algorithm that uses list methods:

#!/usr/bin/env python

def list_find(what, where):
    """Find `what` list in the `where` list.

    Return index in `where` where `what` starts
    or -1 if no such index.

    >>> f = list_find
    >>> f([2, 1], [-1, 0, 1, 2])
    -1
    >>> f([-1, 1, 2], [-1, 0, 1, 2])
    -1
    >>> f([0, 1, 2], [-1, 0, 1, 2])
    1
    >>> f([1,2], [-1, 0, 1, 2])
    2
    >>> f([1,3], [-1, 0, 1, 2])
    -1
    >>> f([1, 2], [[1, 2], 3])
    -1
    >>> f([[1, 2]], [[1, 2], 3])
    0
    """
    if not what: # empty list is always found
        return 0
    try:
        index = 0
        while True:
            index = where.index(what[0], index)
            if where[index:index+len(what)] == what:
                return index # found
            index += 1 # try next position
    except ValueError:
        return -1 # not found

def contains(what, where):
    """Return [start, end+1] if found else empty list."""
    i = list_find(what, where)
    return [i, i + len(what)] if i >= 0 else [] #NOTE: bool([]) == False

if __name__=="__main__":
    import doctest; doctest.testmod()

Method 11

Smallest code:

def contains(a,b):
    str(a)[1:-1].find(str(b)[1:-1])>=0

Method 12

Here is my answer. This function will help you to find out whether B is a sub-list of A. Time complexity is O(n).

`def does_A_contain_B(A, B): #remember now A is the larger list
    b_size = len(B)
    for a_index in range(0, len(A)):
        if A[a_index : a_index+b_size]==B:
            return True
    else:
        return False`

Method 13

I tried to make this as efficient as possible.

It uses a generator; those unfamiliar with these beasts are advised to check out their documentation and that of yield expressions.

Basically it creates a generator of values from the subsequence that can be reset by sending it a true value. If the generator is reset, it starts yielding again from the beginning of sub.

Then it just compares successive values of sequence with the generator yields, resetting the generator if they don’t match.

When the generator runs out of values, i.e. reaches the end of sub without being reset, that means that we’ve found our match.

Since it works for any sequence, you can even use it on strings, in which case it behaves similarly to str.find, except that it returns False instead of -1.

As a further note: I think that the second value of the returned tuple should, in keeping with Python standards, normally be one higher. i.e. "string"[0:2] == "st". But the spec says otherwise, so that’s how this works.

It depends on if this is meant to be a general-purpose routine or if it’s implementing some specific goal; in the latter case it might be better to implement a general-purpose routine and then wrap it in a function which twiddles the return value to suit the spec.

def reiterator(sub):
    """Yield elements of a sequence, resetting if sent ``True``."""
    it = iter(sub)
    while True:
        if (yield it.next()):
            it = iter(sub)

def find_in_sequence(sub, sequence):
    """Find a subsequence in a sequence.

    >>> find_in_sequence([2, 1], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([-1, 1, 2], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([0, 1, 2], [-1, 0, 1, 2])
    (1, 3)
    >>> find_in_sequence("subsequence",
    ...                  "This sequence contains a subsequence.")
    (25, 35)
    >>> find_in_sequence("subsequence", "This one doesn't.")
    False

    """
    start = None
    sub_items = reiterator(sub)
    sub_item = sub_items.next()
    for index, item in enumerate(sequence):
        if item == sub_item:
            if start is None: start = index
        else:
            start = None
        try:
            sub_item = sub_items.send(start is None)
        except StopIteration:
            # If the subsequence is depleted, we win!
            return (start, index)
    return False

Method 14

I think this one is fast…

def issublist(subList, myList, start=0):
    if not subList: return 0
    lenList, lensubList = len(myList), len(subList)
    try:
        while lenList - start >= lensubList:
            start = myList.index(subList[0], start)
            for i in xrange(lensubList):
                if myList[start+i] != subList[i]:
                    break
            else:
                return start, start + lensubList - 1
            start += 1
        return False
    except:
        return False

Method 15

The problem of most of the answers, that they are good for unique items in list. If items are not unique and you still want to know whether there is an intersection, you should count items:

from collections import Counter as count

def listContains(l1, l2):
  list1 = count(l1)
  list2 = count(l2)

  return list1&list2 == list1

print( listContains([1,1,2,5], [1,2,3,5,1,2,1]) ) # Returns True
print( listContains([1,1,2,8], [1,2,3,5,1,2,1]) ) # Returns False

You can also return the intersection by using ''.join(list1&list2)

Method 16

Here a solution with less line of code and easily understandable (or at least I like to think so).

If you want to keep order (match only if the smaller list is found in the same order on the bigger list):

def is_ordered_subset(l1, l2):
    # First check to see if all element of l1 are in l2 (without checking order)
    if not set(l1).issubset(l2): 
        return False

    length = len(l1)
    # Make sublist of same size than l1
    list_of_sublist = [l2[i:i+length] for i, x in enumerate(l2)]
    #Check if one of this sublist is l1
    return l1 in list_of_sublist

Method 17

Dave answer is good. But I suggest this implementation which is more efficient and doesn’t use nested loops.

def contains(small_list, big_list):
    """
    Returns index of start of small_list in big_list if big_list
    contains small_list, otherwise -1.
    """
    loop = True
    i, curr_id_small= 0, 0
    while loop and i<len(big_list):
        if big_list[i]==small_list[curr_id_small]:
            if curr_id_small==len(small_list)-1:
                loop = False
            else:
                curr_id_small += 1
        else:
            curr_id_small = 0
        i=i+1
    if not loop:
        return i-len(small_list)
    else:
        return -1

Method 18

Here’s a simple and efficient function to check whether big list contains a small one in matching order:

def contains(big, small):
    i = 0
    for value in big:
        if value == small[i]:
            i += 1
            if i == len(small):
                return True
        else:
            i = 1 if value == small[0] else 0
    return False

Usage:

"""
>>> contains([1,2,3,4,5], [2,3,4])
True
>>> contains([4,2,3,2,4], [2,3,4])
False
>>> contains([1,2,3,2,3,2,2,4,3], [2,4,3])
True
"""

Method 19

You can use numpy:

def contains(l1, l2):
   """ returns True if l2 conatins l1 and False otherwise """

   if len(np.intersect1d(l1,l2))==len(l1):
      return = True
   else:
      return = False

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating