Скачать или смотреть pandas generate columns of cumsums based on variable names in two different columns

pandas generate columns of cumsums based on variable names in two different columns

Скачать pandas generate columns of cumsums based on variable names in two different columns бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно pandas generate columns of cumsums based on variable names in two different columns или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку pandas generate columns of cumsums based on variable names in two different columns бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео pandas generate columns of cumsums based on variable names in two different columns

Hello everyone! I hope this video has helped solve your questions and issues. This video is shared because a solution has been found for the question/problem. I create videos for questions that have solutions. If you have any other issues, feel free to reach out to me on Instagram: / ky.emrah

Below, you can find the text related to the question/problem. In the video, the question will be presented first, followed by the answers. If the video moves too fast, feel free to pause and review the answers. If you need more detailed information, you can find the necessary sources and links at the bottom of this description. I hope this video has been helpful, and even if it doesn't directly solve your problem, it will guide you to the source of the solution. I'd appreciate it if you like the video and subscribe to my channel!pandas generate columns of cumsums based on variable names in two different columns

I have a dataframe as follows:
import pandas
import numpy
df = pandas.DataFrame( data= {'s1' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
's2' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
'val':numpy.random.randint(low=-1, high=3, size=20)}, )

import pandas
import numpy
df = pandas.DataFrame( data= {'s1' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
's2' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
'val':numpy.random.randint(low=-1, high=3, size=20)}, )

I want to generate two result columns that provide a cumulative sum of a value (val) based on the categories in 's1' and/or 's2'.
A category ('A, 'B', 'C' etc) can appear in either s1 or s2.The first time a category appears in either s1 or s2, its value starts at zero, then next time it appears its value would be sum of previous values (val)
Dataframe example could look as follows:
s1 s2 val ans1 ans2
0 E B 1 0.0 0.0
1 E C 1 1.0 0.0
2 E A 2 2.0 0.0
3 B A 0 1.0 2.0
4 E B 1 4.0 1.0
5 B C 1 2.0 1.0

s1 s2 val ans1 ans2
0 E B 1 0.0 0.0
1 E C 1 1.0 0.0
2 E A 2 2.0 0.0
3 B A 0 1.0 2.0
4 E B 1 4.0 1.0
5 B C 1 2.0 1.0

I can generate the correct answer columns (ans1 and ans2 - corresponding to set1 and set2 columns) as follows:
temp={}
df['ans1'] = numpy.nan
df['ans2'] = numpy.nan
for idx, row in df.iterrows():
if row['s1'] in temp:
df.loc[idx,'ans1'] = temp[ row['s1'] ]
temp[ row['s1'] ] = temp[ row['s1'] ] + row['val']
else:
temp[ row['s1'] ] = row['val']
df.loc[idx,'ans1'] = 0

if row['s2'] in temp:
df.loc[idx,'ans2'] = temp[ row['s2'] ]
temp[ row['s2'] ] = temp[ row['s2'] ] + row['val']
else:
temp[ row['s2'] ] = row['val']
df.loc[idx,'ans2'] = 0

temp={}
df['ans1'] = numpy.nan
df['ans2'] = numpy.nan
for idx, row in df.iterrows():
if row['s1'] in temp:
df.loc[idx,'ans1'] = temp[ row['s1'] ]
temp[ row['s1'] ] = temp[ row['s1'] ] + row['val']
else:
temp[ row['s1'] ] = row['val']
df.loc[idx,'ans1'] = 0

if row['s2'] in temp:
df.loc[idx,'ans2'] = temp[ row['s2'] ]
temp[ row['s2'] ] = temp[ row['s2'] ] + row['val']
else:
temp[ row['s2'] ] = row['val']
df.loc[idx,'ans2'] = 0

using 'temp' as a dictionary to hold the running totals of each category (A-E) I can get the two answer columns...
What i cant do is find a solution to this without iterating over each row of the dataframe.
I dont can an issue in the case with only s1 - where i can use .groupby().cumsum().shift(1) and get the correct values in correct rows, but cannot find a solution where there are two sets s1 and s2 (or more as I have multiple sensors to track), so i am hoping there is a general more vectorised solution that will work?

Tags: python,pandasSource of the question:
https://stackoverflow.com/questions/7...

Question and source license information:
https://meta.stackexchange.com/help/l...
https://stackoverflow.com/

Комментарии

Информация по комментариям в разработке