• Can you help me with this memoization simple example?

    From marc nicole@mk1853387@gmail.com to comp.lang.python on Sun Mar 31 01:09:43 2024
    From Newsgroup: comp.lang.python

    I am creating a memoization example with a function that adds up / averages
    the elements of an array and compares it with the cached ones to retrieve
    them in case they are already stored.

    In addition, I want to store only if the result of the function differs considerably (passes a threshold e.g. 500000 below).

    I created an example using a decorator to do so, the results using the decorator is slightly faster than without the memoization which is OK, but
    is the logic of the decorator correct ? anybody can tell me ?

    My code is attached below:



    import time


    def memoize(f):
    cache = {}

    def g(*args):
    if args[1] == "avg":
    sum_key_arr = sum(list(args[0])) / len(list(args[0]))
    elif args[1] == "sum":
    sum_key_arr = sum(list(args[0]))
    if sum_key_arr not in cache:
    for (
    key,
    value,
    ) in (
    cache.items()
    ): # key in dict cannot be an array so I use the sum of the
    array as the key
    if (
    abs(sum_key_arr - key) <= 500000
    ): # threshold is great here so that all values are approximated!
    # print('approximated')
    return cache[key]
    else:
    # print('not approximated')
    cache[sum_key_arr] = f(args[0], args[1])
    return cache[sum_key_arr]

    return g


    @memoize
    def aggregate(dict_list_arr, operation):
    if operation == "avg":
    return sum(list(dict_list_arr)) / len(list(dict_list_arr))
    if operation == "sum":
    return sum(list(dict_list_arr))
    return None


    t = time.time()
    for i in range(200, 15000):
    res = aggregate(list(range(i)), "avg")

    elapsed = time.time() - t
    print(res)
    print(elapsed)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From MRAB@python@mrabarnett.plus.com to comp.lang.python on Sun Mar 31 00:39:09 2024
    From Newsgroup: comp.lang.python

    On 2024-03-31 00:09, marc nicole via Python-list wrote:
    I am creating a memoization example with a function that adds up / averages the elements of an array and compares it with the cached ones to retrieve them in case they are already stored.

    In addition, I want to store only if the result of the function differs considerably (passes a threshold e.g. 500000 below).

    I created an example using a decorator to do so, the results using the decorator is slightly faster than without the memoization which is OK, but
    is the logic of the decorator correct ? anybody can tell me ?

    My code is attached below:



    import time


    def memoize(f):
    cache = {}

    def g(*args):
    if args[1] == "avg":
    sum_key_arr = sum(list(args[0])) / len(list(args[0]))

    'list' will iterate over args[0] to make a list, and 'sum' will iterate
    over that list.

    It would be simpler to just let 'sum' iterate over args[0].

    elif args[1] == "sum":
    sum_key_arr = sum(list(args[0]))
    if sum_key_arr not in cache:
    for (
    key,
    value,
    ) in (
    cache.items()
    ): # key in dict cannot be an array so I use the sum of the array as the key

    You can't use a list as a key, but you can use a tuple as a key,
    provided that the elements of the tuple are also immutable.

    if (
    abs(sum_key_arr - key) <= 500000
    ): # threshold is great here so that all values are approximated!
    # print('approximated')
    return cache[key]
    else:
    # print('not approximated')
    cache[sum_key_arr] = f(args[0], args[1])
    return cache[sum_key_arr]

    return g


    @memoize
    def aggregate(dict_list_arr, operation):
    if operation == "avg":
    return sum(list(dict_list_arr)) / len(list(dict_list_arr))
    if operation == "sum":
    return sum(list(dict_list_arr))
    return None


    t = time.time()
    for i in range(200, 15000):
    res = aggregate(list(range(i)), "avg")

    elapsed = time.time() - t
    print(res)
    print(elapsed)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From marc nicole@mk1853387@gmail.com to comp.lang.python on Sun Mar 31 10:04:13 2024
    From Newsgroup: comp.lang.python

    Thanks for the first comment which I incorporated
    but when you say "You can't use a list as a key, but you can use a tuple as
    a key,
    provided that the elements of the tuple are also immutable."
    does it mean the result of sum of the array is not convenient to use as
    key as I do?
    Which tuple I should use to refer to the underlying list value as you
    suggest?
    Anything else is good in my code ?
    Thanks
    Le dim. 31 mars 2024 à 01:44, MRAB via Python-list <python-list@python.org>
    a écrit :
    On 2024-03-31 00:09, marc nicole via Python-list wrote:
    I am creating a memoization example with a function that adds up /
    averages
    the elements of an array and compares it with the cached ones to retrieve them in case they are already stored.

    In addition, I want to store only if the result of the function differs considerably (passes a threshold e.g. 500000 below).

    I created an example using a decorator to do so, the results using the decorator is slightly faster than without the memoization which is OK,
    but
    is the logic of the decorator correct ? anybody can tell me ?

    My code is attached below:



    import time


    def memoize(f):
    cache = {}

    def g(*args):
    if args[1] == "avg":
    sum_key_arr = sum(list(args[0])) / len(list(args[0]))

    'list' will iterate over args[0] to make a list, and 'sum' will iterate
    over that list.

    It would be simpler to just let 'sum' iterate over args[0].

    elif args[1] == "sum":
    sum_key_arr = sum(list(args[0]))
    if sum_key_arr not in cache:
    for (
    key,
    value,
    ) in (
    cache.items()
    ): # key in dict cannot be an array so I use the sum of the array as the key

    You can't use a list as a key, but you can use a tuple as a key,
    provided that the elements of the tuple are also immutable.

    if (
    abs(sum_key_arr - key) <= 500000
    ): # threshold is great here so that all values are approximated!
    # print('approximated')
    return cache[key]
    else:
    # print('not approximated')
    cache[sum_key_arr] = f(args[0], args[1])
    return cache[sum_key_arr]

    return g


    @memoize
    def aggregate(dict_list_arr, operation):
    if operation == "avg":
    return sum(list(dict_list_arr)) / len(list(dict_list_arr))
    if operation == "sum":
    return sum(list(dict_list_arr))
    return None


    t = time.time()
    for i in range(200, 15000):
    res = aggregate(list(range(i)), "avg")

    elapsed = time.time() - t
    print(res)
    print(elapsed)


    --
    https://mail.python.org/mailman/listinfo/python-list

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From MRAB@python@mrabarnett.plus.com to comp.lang.python on Sun Mar 31 20:23:50 2024
    From Newsgroup: comp.lang.python

    On 2024-03-31 09:04, marc nicole wrote:
    Thanks for the first comment which I incorporated

    but when you say "You can't use a list as a key, but you can use a
    tuple as a key,
    provided that the elements of the tuple are also immutable."

    does it mean  the result of sum of the array is not convenient to use
    as key as I do?
    Which tuple I should use to refer to the underlying list value as you suggest?

    I was suggesting using `tuple` on the argument:

    def memoize(f):
         cache = {}

         def g(*args):
             key = tuple(args[0]), args[1]

             if key not in cache:
                 cache[key] = f(args[0], args[1])

             return cache[key]

         return g

    Anything else is good in my code ?

    Thanks

    Le dim. 31 mars 2024 à 01:44, MRAB via Python-list <python-list@python.org> a écrit :

    On 2024-03-31 00:09, marc nicole via Python-list wrote:
    > I am creating a memoization example with a function that adds up
    / averages
    > the elements of an array and compares it with the cached ones to
    retrieve
    > them in case they are already stored.
    >
    > In addition, I want to store only if the result of the function
    differs
    > considerably (passes a threshold e.g. 500000 below).
    >
    > I created an example using a decorator to do so, the results
    using the
    > decorator is slightly faster than without the memoization which
    is OK, but
    > is the logic of the decorator correct ? anybody can tell me ?
    >
    > My code is attached below:
    >
    >
    >
    > import time
    >
    >
    > def memoize(f):
    >      cache = {}
    >
    >      def g(*args):
    >          if args[1] == "avg":
    >              sum_key_arr = sum(list(args[0])) / len(list(args[0]))

    'list' will iterate over args[0] to make a list, and 'sum' will
    iterate
    over that list.

    It would be simpler to just let 'sum' iterate over args[0].

    >          elif args[1] == "sum":
    >              sum_key_arr = sum(list(args[0]))
    >          if sum_key_arr not in cache:
    >              for (
    >                  key,
    >                  value,
    >              ) in (
    >                  cache.items()
    >              ):  # key in dict cannot be an array so I use the
    sum of the
    > array as the key

    You can't use a list as a key, but you can use a tuple as a key,
    provided that the elements of the tuple are also immutable.

    >                  if (
    >                      abs(sum_key_arr - key) <= 500000
    >                  ):  # threshold is great here so that all
    values are
    > approximated!
    >                      # print('approximated')
    >                      return cache[key]
    >              else:
    >                  # print('not approximated')
    >                  cache[sum_key_arr] = f(args[0], args[1])
    >          return cache[sum_key_arr]
    >
    >      return g
    >
    >
    > @memoize
    > def aggregate(dict_list_arr, operation):
    >      if operation == "avg":
    >          return sum(list(dict_list_arr)) / len(list(dict_list_arr))
    >      if operation == "sum":
    >          return sum(list(dict_list_arr))
    >      return None
    >
    >
    > t = time.time()
    > for i in range(200, 15000):
    >      res = aggregate(list(range(i)), "avg")
    >
    > elapsed = time.time() - t
    > print(res)
    > print(elapsed)


    --
    https://mail.python.org/mailman/listinfo/python-list

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From avi.e.gross@avi.e.gross@gmail.com to comp.lang.python on Sun Mar 31 16:56:19 2024
    From Newsgroup: comp.lang.python

    I am not sure if it was made clear that there is a general rule in python for what is HASHABLE and lists are changeable while tuples are not so the latter can be hashed as a simple copy of a list, albeit the contents must also be immutable.
    The memorize function uses a dictionary to store things and thus the things are hashed to decide how to store it in the inner representation of a dictionary and anything new that you want to look up in the dictionary has similar considerations as it is hashed to see where in the dictionary to look for it.
    Of course, if you add enough overhead and the memorize function you make gets relatively few requests that are identical, it may not be worthwhile.
    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of MRAB via Python-list
    Sent: Sunday, March 31, 2024 3:24 PM
    To: python-list@python.org
    Subject: Re: Can you help me with this memoization simple example?
    On 2024-03-31 09:04, marc nicole wrote:
    Thanks for the first comment which I incorporated

    but when you say "You can't use a list as a key, but you can use a
    tuple as a key,
    provided that the elements of the tuple are also immutable."

    does it mean the result of sum of the array is not convenient to use
    as key as I do?
    Which tuple I should use to refer to the underlying list value as you suggest?

    I was suggesting using `tuple` on the argument:
    def memoize(f):
    cache = {}
    def g(*args):
    key = tuple(args[0]), args[1]
    if key not in cache:
    cache[key] = f(args[0], args[1])
    return cache[key]
    return g
    Anything else is good in my code ?

    Thanks

    Le dim. 31 mars 2024 à 01:44, MRAB via Python-list
    <python-list@python.org> a écrit :

    On 2024-03-31 00:09, marc nicole via Python-list wrote:
    > I am creating a memoization example with a function that adds up
    / averages
    > the elements of an array and compares it with the cached ones to
    retrieve
    > them in case they are already stored.
    >
    > In addition, I want to store only if the result of the function
    differs
    > considerably (passes a threshold e.g. 500000 below).
    >
    > I created an example using a decorator to do so, the results
    using the
    > decorator is slightly faster than without the memoization which
    is OK, but
    > is the logic of the decorator correct ? anybody can tell me ?
    >
    > My code is attached below:
    >
    >
    >
    > import time
    >
    >
    > def memoize(f):
    > cache = {}
    >
    > def g(*args):
    > if args[1] == "avg":
    > sum_key_arr = sum(list(args[0])) / len(list(args[0]))

    'list' will iterate over args[0] to make a list, and 'sum' will
    iterate
    over that list.

    It would be simpler to just let 'sum' iterate over args[0].

    > elif args[1] == "sum":
    > sum_key_arr = sum(list(args[0]))
    > if sum_key_arr not in cache:
    > for (
    > key,
    > value,
    > ) in (
    > cache.items()
    > ): # key in dict cannot be an array so I use the
    sum of the
    > array as the key

    You can't use a list as a key, but you can use a tuple as a key,
    provided that the elements of the tuple are also immutable.

    > if (
    > abs(sum_key_arr - key) <= 500000
    > ): # threshold is great here so that all
    values are
    > approximated!
    > # print('approximated')
    > return cache[key]
    > else:
    > # print('not approximated')
    > cache[sum_key_arr] = f(args[0], args[1])
    > return cache[sum_key_arr]
    >
    > return g
    >
    >
    > @memoize
    > def aggregate(dict_list_arr, operation):
    > if operation == "avg":
    > return sum(list(dict_list_arr)) / len(list(dict_list_arr))
    > if operation == "sum":
    > return sum(list(dict_list_arr))
    > return None
    >
    >
    > t = time.time()
    > for i in range(200, 15000):
    > res = aggregate(list(range(i)), "avg")
    >
    > elapsed = time.time() - t
    > print(res)
    > print(elapsed)


    --
    https://mail.python.org/mailman/listinfo/python-list

    --
    https://mail.python.org/mailman/listinfo/python-list
    --- Synchronet 3.20a-Linux NewsLink 1.114