BeautifulSoup ve requests kullanarak yazım kurallarına uygun bir metin belgesi oluşturma

Otomatik paket indirici projem için bir info butonu oluşturdum ve bilgileri https://pypi.org/project/selenium/ adresinden çekiyorum ve bunları

find_all("p")

ile çekiyorum ama iki sorun yükseliyor
1)Çekilen p etiketi tam olarak düzenli gelmiyor,boşluklu yerleşiyor
2)Boşluklardan kurtulup etiketi yok etmem lazım

Öncelikle şu koda bir bakalım;

from bs4 import BeautifulSoup
import requests

web_html = requests.get("https://pypi.org/project/selenium/")

soup = BeautifulSoup(web_html.text,'html.parser')

for tag in soup.find_all('p'):

    print(tag)

Çıktısı:

<p class="release__version-date">
<time data-controller="localized-time" data-localized-time-relative="true" data-localized-time-show-time="false" datetime="2010-05-26T04:19:40+0000">
  May 26, 2010
</time>
</p>
<p class="release__version">
                  1.0.1


                </p>
<p class="release__version-date">
<time data-controller="localized-time" data-localized-time-relative="true" data-localized-time-show-time="false" datetime="2009-06-17T18:58:25+0000">
  Jun 17, 2009
</time>
</p>
<p class="release__version">
                  0.9.2

P etiketlerinin altındaki değerlere kendisi boşlukları eklemiş yani sizin çektiğiniz değerler
bozuk değerler değil modül ne varsa onu size bastırıyor.

Peki ben bu boşluklardan nasıl kurtulacağım :

from bs4 import BeautifulSoup

import requests

web_html = requests.get("https://pypi.org/project/selenium/")

soup = BeautifulSoup(web_html.text,'html.parser')

for tag in soup.find_all('p'):

    strings = "".join(line.strip() for line in str(tag.string).split("\n"))

    print(strings)

Çıktı

None
None
Python bindings for Selenium
None
None
Python language bindings for Selenium WebDriver.
None
Several browsers/drivers are supported (Firefox, Chrome, Internet Explorer), as well as the Remote protocol.
None
None
None
None
None
Other supported browsers will have their own drivers available. Links to some of the more popular browser drivers follow.
None
For normal WebDriver scripts (non-Remote), the Java server is not needed.
However, to use Selenium Webdriver Remote or the legacy Selenium API (Selenium-RC), you need to also run the Selenium server.  The server requires a Java Runtime Environment (JRE).
None
Run the server from the command line:
Then run your Python client scripts.
View source code online:
None
None
None
None
None
None
None
None
3.141.0
None
3.14.1
None
3.14.0
None
3.13.0
None
3.12.0
None
3.11.0
None
3.10.0
None
3.9.0
None
3.8.1
None
3.8.0
None
3.7.0
None
3.6.0
None
3.5.0
None
3.4.3
None
3.4.2
None
3.4.1
None
3.4.0
None
3.3.3
None
3.3.2
None
3.3.1
None
3.3.0
None
3.0.2
None
3.0.1
None
3.0.0
None
None
None
None
None
None
None
2.53.6
None
2.53.5
None
2.53.4
None
2.53.3
None
2.53.2
None
2.53.1
None
2.53.0
None
2.52.0
None
2.51.1
None
2.51.0
None
2.50.1
None
2.50.0
None
2.49.2
None
2.49.1
None
2.49.0
None
2.48.0
None
2.47.3
None
2.47.2
None
2.47.1
None
2.47.0
None
2.46.1
None
2.46.0
None
2.45.0
None
2.44.0
None
2.43.0
None
2.42.1
None
2.42.0
None
2.41.0
None
2.40.0
None
2.39.0
None
2.38.4
None
2.38.3
None
2.38.2
None
2.38.1
None
2.38.0
None
2.37.2
None
2.37.1
None
2.37.0
None
2.36.0
None
2.35.0
None
2.34.0
None
2.33.0
None
2.32.0
None
2.31.0
None
2.30.0
None
2.29.0
None
2.28.0
None
2.27.0
None
2.26.0
None
2.25.0
None
2.24.0
None
2.23.0
None
2.22.1
None
2.22.0
None
2.21.3
None
2.21.2
None
2.21.1
None
2.21.0
None
2.20.0
None
2.19.1
None
2.19.0
None
2.18.1
None
2.17.0
None
2.16.0
None
2.15.0
None
2.14.0
None
2.13.1
None
2.13.0
None
2.12.1
None
2.12.0
None
2.11.1
None
2.11.0
None
2.10.0
None
2.9.0
None
2.8.1
None
2.8.0
None
2.7.0
None
2.6.0
None
2.5.0
None
2.4.0
None
2.3.0
None
2.2.0
None
2.1.0
None
2.0.1
None
2.0.0
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
1.0.3
None
1.0.1
None
0.9.2
None
None
None
None
None
Supported by

Siz gerekli class’ı seçip üzerinde oynayabilirsiniz.

İyi günler…

1 Beğeni

Sadece None ’ lardan kurtulmam gerekiyor

Basitce :slight_smile:

from bs4 import BeautifulSoup

import requests

web_html = requests.get("https://pypi.org/project/selenium/")

soup = BeautifulSoup(web_html.text,'html.parser')

for tag in soup.find_all('p'):

    strings = "".join(line.strip() for line in str(tag.string).split("\n"))

    if strings != "None":

        print(strings)

Output:

Python bindings for Selenium
Python language bindings for Selenium WebDriver.
Several browsers/drivers are supported (Firefox, Chrome, Internet Explorer), as well as the Remote protocol.
Other supported browsers will have their own drivers available. Links to some of the more popular browser drivers follow.
For normal WebDriver scripts (non-Remote), the Java server is not needed.
However, to use Selenium Webdriver Remote or the legacy Selenium API (Selenium-RC), you need to also run the Selenium server.  The server requires a Java Runtime Environment (JRE).
Run the server from the command line:
Then run your Python client scripts.
View source code online:
3.141.0
3.14.1
3.14.0
3.13.0
3.12.0
3.11.0
3.10.0
3.9.0
3.8.1
3.8.0
3.7.0
3.6.0
3.5.0
3.4.3
3.4.2
3.4.1
3.4.0
3.3.3
3.3.2
3.3.1
3.3.0
3.0.2
3.0.1
3.0.0
2.53.6
2.53.5
2.53.4
2.53.3
2.53.2
2.53.1
2.53.0
2.52.0
2.51.1
2.51.0
2.50.1
2.50.0
2.49.2
2.49.1
2.49.0
2.48.0
2.47.3
2.47.2
2.47.1
2.47.0
2.46.1
2.46.0
2.45.0
2.44.0
2.43.0
2.42.1
2.42.0
2.41.0
2.40.0
2.39.0
2.38.4
2.38.3
2.38.2
2.38.1
2.38.0
2.37.2
2.37.1
2.37.0
2.36.0
2.35.0
2.34.0
2.33.0
2.32.0
2.31.0
2.30.0
2.29.0
2.28.0
2.27.0
2.26.0
2.25.0
2.24.0
2.23.0
2.22.1
2.22.0
2.21.3
2.21.2
2.21.1
2.21.0
2.20.0
2.19.1
2.19.0
2.18.1
2.17.0
2.16.0
2.15.0
2.14.0
2.13.1
2.13.0
2.12.1
2.12.0
2.11.1
2.11.0
2.10.0
2.9.0
2.8.1
2.8.0
2.7.0
2.6.0
2.5.0
2.4.0
2.3.0
2.2.0
2.1.0
2.0.1
2.0.0
1.0.3
1.0.1
0.9.2
Supported by
1 Beğeni

Sadece bir sorum daha olacaktı int değerleri silebilirmisiniz rica etsem

from bs4 import BeautifulSoup

import requests

web_html = requests.get("https://pypi.org/project/selenium/")

soup = BeautifulSoup(web_html.text,'html.parser')

for tag in soup.find_all('p'):

    strings = "".join(line.strip() for line in str(tag.string).split("\n"))

    if strings != "None":
    if strings != "0":
    if strings != "1":
    if strings != "2":
    if strings != "3":
    if strings != "4":
    if strings != "5":
    if strings != "6":
    if strings != "7":
    if strings != "8":
    if strings != "9":


        print(strings)

Aman hocam ben daha cok yeniyim Güvenme bana sakın dene belki olur :smiley: Hoca yukarda Noneyi oyle kapatmış, bence böylede int değerlerin önüne geçilir diye düşündüm yazdım :smiley:

1 Beğeni

Liste ile yapsam sanki olur gibi

1 Beğeni

Yani benimki işin ameleliği, aslında bir dize oluşturulup o dize içinde geçen bir karakter varsa eğer, yayınlama denilebilir zaten hocamızın yaptığıda o.
Eğer strings “None” ye eşit değilse > if strings != “None”:
printle.

1 Beğeni

İkinizin de eline sağlık kendinizi geliştirmeniz için ufak bir örnek vereceğim bunu biraz
geliştirerek siz de bu programa uygulayabilirsiniz :wink:

number = "3.4.2.1"
list_n = [value for value in number if value.isdigit() == True]

print(list_n)

Output;
['3','4','2','1']
2 Beğeni

Alt alta aynı hizaya if yazılmıyor, ama and kullanılabilir.

2 Beğeni