Bir csv Data Frame'indeki Sütunu Çekme, İşleme, Sözlük Olarak Döndürme

Shanyu · Kasım 6, 2020, 4:56ös

Şimdi elimde şu linke yükledeğim bir veri seti var:
Link: https://s7.dosya.tc/server18/skpakr/NISPUF17.csv.html

import pandas as pd

def proportion_of_education():
    df = pd.read_csv(r"Dosya_Yolu\NISPUF17.csv")
    print(df["EDUC1"] != "College Graduate")

proportion_of_education()

Bu veri setinden belli bir kategorideki, diyelim ki EDUC1 sütunundaki veriyi şöyle çekeceğim:

{"less than high school":0.2,
    "high school":0.4,
    "more than high school but not college":0.2,
    "college":0.2}

Bu şekilde çekebilmem için, yukarıda başladığım koda yardımcı olabilir misiniz?

anon18277073 · Kasım 6, 2020, 5:23ös

df.EDUC1.value_counts(normalize=True).to_dict()

Shanyu · Kasım 6, 2020, 6:57ös

Senin söylediğini ekleyerek, şöyle bir şey yaptım, ama olmadı. Döndürüyor, ama her satır için uygulamıyor, her satıra aynı değeri veriyor. Hatam nerede sence?

import pandas as pd

def proportion_of_education():
    
    df = pd.read_csv(r"_yol_\NISPUF17.csv")
    for i in range(len(df.index)):
                   print(df.EDUC1[df.EDUC1 != "COLLEGE GRAD"].value_counts(normalize=True).to_dict())
        
proportion_of_education()

anon18277073 · Kasım 6, 2020, 8:01ös

for’a gerek yok zaten, her bir satır için bir tane veri var seçili sütunda, onun nasıl bir istatistiği çıkarılabilir ki?

bu o tüm sütuna dair bir histogram

Shanyu · Kasım 7, 2020, 1:32ös

Anladığım kadarıyla benden yukarıdaki anahtarlara karşılık gelen değerleri csv dosyasından çekmemi ve sözlük olarak yukarıdaki çıktıyı döndürmemi istiyor.
Arkadaşlar şöyle bir şey yazdım:

import pandas as pd

def proportion_of_education():
    
    df = pd.read_csv(r"NISPUF17.csv")
    
    liste1=[]
    liste2=["less than high school", "high school", "more than high school but not college", "college"]
    
    totdat=len(df.index)
    totlesshighsdat=len(df.EDUC1[df.EDUC1 == "< 12 YEARS"])
    tothighgsdat=len(df.EDUC1[df.EDUC1 == "12 YEARS"])
    totmorehighsnotcdat=len(df.EDUC1[df.EDUC1 == "> 12 YEARS, NON-COLLEGE GRAD"])
    totcollagedat=len(df.EDUC1[df.EDUC1 == "COLLEGE GRAD"])
    
    x1=totlesshighsdat/totdat
    x2=tothighgsdat/totdat
    x3=totmorehighsnotcdat/totdat
    x4=totcollagedat/totdat
    liste1.append(x1)
    liste1.append(x2)
    liste1.append(x3)
    liste1.append(x4)
    
    print(dict(zip(liste2, liste1)))

Ama yine de, aşağıdaki kodlardan dolayı şöyle bir hata veriyor:

assert type(proportion_of_education())==type({}), "You must return a dictionary."
assert len(proportion_of_education()) == 4, "You have not returned a dictionary with four items in it."
assert "less than high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "more than high school but not college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."

{'less than high school': 0.0, 'high school': 0.0, 'more than high school but not college': 0.0, 'college': 0.0}
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-8-2eae51ff42e4> in <module>
----> 1 assert type(proportion_of_education())==type({}), "You must return a dictionary."
      2 assert len(proportion_of_education()) == 4, "You have not returned a dictionary with four items in it."
      3 assert "less than high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
      4 assert "high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
      5 assert "more than high school but not college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."

AssertionError: You must return a dictionary.

Oysa ben sözlüğe dönüştürdüm, sizce neden?

Shanyu · Kasım 7, 2020, 1:42ös

import pandas as pd

def proportion_of_education():
    
    df = pd.read_csv("assets/NISPUF17.csv")
    liste1=[]
    liste2=["less than high school", "high school", "more than high school but not college", "college"]
    totdat=len(df.index)
    totlesshighsdat=len(df.EDUC1[df.EDUC1 == "< 12 YEARS"])
    tothighgsdat=len(df.EDUC1[df.EDUC1 == "12 YEARS"])
    totmorehighsnotcdat=len(df.EDUC1[df.EDUC1 == "> 12 YEARS, NON-COLLEGE GRAD"])
    totcollagedat=len(df.EDUC1[df.EDUC1 == "COLLEGE GRAD"])
    
    x1=totlesshighsdat/totdat
    x2=tothighgsdat/totdat
    x3=totmorehighsnotcdat/totdat
    x4=totcollagedat/totdat
    liste1.append(x1)
    liste1.append(x2)
    liste1.append(x3)
    liste1.append(x4)

    diction=dict(zip(liste2, liste1))
    return diction

assert type(proportion_of_education())==type({}), "You must return a dictionary."
assert len(proportion_of_education()) == 4, "You have not returned a dictionary with four items in it."
assert "less than high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "high school" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "more than high school but not college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."
assert "college" in proportion_of_education().keys(), "You have not returned a dictionary with the correct keys."