معرفی 5 کتابخانه برتر برای یادگیری ماشین با پایتون {یادگیری ماشین با پایتون}

در این مقاله ۵ کتابخانه یادگیری ماشین که در پایتون به صورت گسترده توسط افراد متخصص استفاده میشود را برای شما معرفی خواهیم کرد پس با من همراه باشید.

5 کتابخانه مطرح یادگیری ماشین با پایتون

کتابخانه Numpy
کتابخانه Pandas
کتابخانه Matplotlib
کتابخانه SciKit-Learn
کتابخانه NLTK

در ادامه با مثال هر یک از این کتابخانه ها را مرور خواهیم کرد.

کتابخانه Numpy در پایتون

کتابخانه Numpy یکی از پکیج های اساسی برای محاسبات علمی در پایتون می باشد. این کتابخانه اغلب برای حل مسائل ماتریسی استفاده می شود.

مثال : ساخت یک آرایه با استفاده از Numpy در پایتون

>>> import numpy as np
>>> arr = np.array([])
>>> type(arr)
numpy.ndarray

>>> import numpy as np

>>> arr = np.array([])

>>> type(arr)

numpy.ndarray

مثال : ساخت یک آرایه یک بعدی با کتابخانه Numpy در پایتون

>>> one_d_array = np.array([1, 2, 3, 4, 5])
# ndim attributes shows the number of dimension of an array
>>> one_d_array.ndim  
1
# size attributes returns the size/length of the array
>>> one_d_array.size
5

>>> one_d_array = np.array([1, 2, 3, 4, 5])

# ndim attributes shows the number of dimension of an array

>>> one_d_array.ndim

# size attributes returns the size/length of the array

>>> one_d_array.size

مثال : ساخت یک آرایه با مقدار اولیه صفر با کتابخانه Numpy در پایتون:

>>> np.zeros(5) # by default it produce float
array([0., 0., 0., 0., 0.])

# zeros() method takes another parameter for data type
>>> np.zeros(5, dtype=int)
array([0, 0, 0, 0, 0])

>>> np.zeros(5) # by default it produce float

array([0., 0., 0., 0., 0.])

# zeros() method takes another parameter for data type

>>> np.zeros(5, dtype=int)

array([0, 0, 0, 0, 0])

مثال : ساخت یک آرایه ترتیبی با استفاده از کتابخانه Numpy در متلب

 if a single parameter was passed then the sequence was start from 0.
>>> print(np.arange(10)) 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# first parameter denotes the starting point
# second paramter denotes the ending point
# if the third parameter was not specified then 1 is used as default step
>>> print(np.arange(1, 10))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

if a single parameter was passed then the sequence was start from 0.

>>> print(np.arange(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# first parameter denotes the starting point

# second paramter denotes the ending point

# if the third parameter was not specified then 1 is used as default step

>>> print(np.arange(1, 10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

مثال : تغییر اندازه ی آرایه با استفاده از کتابخانه Numpy در پایتون (Reshaping )

>>> np.arange(10).reshape(2, 5) # 1d array reshaped into 2d.
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

# flatten an array
>>> np.arange(10).reshape(2, 5).ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# transpose an array
>>> np.arange(10).reshape(2, 5).T

>>> np.arange(10).reshape(2, 5) # 1d array reshaped into 2d.

array([[0, 1, 2, 3, 4],

[5, 6, 7, 8, 9]])

# flatten an array

>>> np.arange(10).reshape(2, 5).ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# transpose an array

>>> np.arange(10).reshape(2, 5).T

برای مطالعه بیشتر بر روی کتابخانه Numpy به این لینک مراجعه کنید

کتابخانه Pandas در پایتون برای یادگیری ماشین:

یکی دیگر از کتابخانه های پر استفاده در یادگیری ماشین کتابخانه Pandas می باشد. این کتابخانه برای تحلیل و دستکاری داده در پایتون استفاده میشود.

نصب کتابخانه Pandas در پایتون :

برای نصب این کتابخانه کافی است که cmd را باز کنید و دستور زیر را تایپ کرده و منتظر اجرا و نصب کتابخانه Pandas بمانید.

pip install pandas

1	pip install pandas

مثال ساخت یک سری در پایتون : سری در واقع یک آرایه یک بعدی برچسب دار مانند زیر می باشد

>>> pd.Series([1,2,3,4,5])
0    1
1    2
2    3
3    4
4    5
dtype: int64

>>> pd.Series([1,2,3,4,5])

0 1

1 2

2 3

3 4

4 5

dtype: int64

مثال : ساخت یک دیتا فریم یا DataFrame در پایتون :

یک دیتا فریم در واقع یک ساختار برچسب کار دو بعدی از داده ها می باشد مانند مثال زیر

# final exam result of 10 students
>>> name = ["Nasir", "Islam", "Sujan", "Sagor", "Jamal", "Rony", "Rana", "Shahin", "Jony", "Sumon"]
>>> math = [99, 58, 30, 40, 70, 77, 83, 68, 23, 0]
>>> english = [78, 67, 34, 33, 32, 21, 45, 89, 95, 10]
>>> physics = [20, 50, 55, 43, 78, 87, 46, 98, 69, 35]

# now we want to create a result DataFrame
>>> result = pd.DataFrame({
               "Name" : name,
               "Math" : math,
               "English": english,
               "Physics" : physics
             })
>>> result
# output shown on below table

# final exam result of 10 students

>>> name = ["Nasir", "Islam", "Sujan", "Sagor", "Jamal", "Rony", "Rana", "Shahin", "Jony", "Sumon"]

>>> math = [99, 58, 30, 40, 70, 77, 83, 68, 23, 0]

>>> english = [78, 67, 34, 33, 32, 21, 45, 89, 95, 10]

>>> physics = [20, 50, 55, 43, 78, 87, 46, 98, 69, 35]

# now we want to create a result DataFrame

>>> result = pd.DataFrame({

"Name" : name,

"Math" : math,

"English": english,

"Physics" : physics

})

>>> result

# output shown on below table

سر(Head ) و دم (Tail) یک دیتا فریم DataFrame در پایتون

# DataFrame.head() returns first 5 rows of a DataFrame
>>> print(result.head()) 
   Name  Math  English  Physics
0  Nasir    99       78       20
1  Islam    58       67       50
2  Sujan    30       34       55
3  Sagor    40       33       43
4  Jamal    70       32       78

# DataFrame.tail() returns first 5 rows of a DataFrame
>>> print(result.tail())
     Name  Math  English  Physics
5    Rony    77       21       87
6    Rana    83       45       46
7  Shahin    68       89       98
8    Jony    23       95       69
9   Sumon     0       10       35

# head() and tail() method takes one parameter to specified the number of rows. 
>>> print(result.head(2))
    Name  Math  English  Physics
0  Nasir    99       78       20
1  Islam    58       67       50

# DataFrame.head() returns first 5 rows of a DataFrame

>>> print(result.head())

Name Math English Physics

0 Nasir 99 78 20

1 Islam 58 67 50

2 Sujan 30 34 55

3 Sagor 40 33 43

4 Jamal 70 32 78

# DataFrame.tail() returns first 5 rows of a DataFrame

>>> print(result.tail())

Name Math English Physics

5 Rony 77 21 87

6 Rana 83 45 46

7 Shahin 68 89 98

8 Jony 23 95 69

9 Sumon 0 10 35

# head() and tail() method takes one parameter to specified the number of rows.

>>> print(result.head(2))

Name Math English Physics

0 Nasir 99 78 20

1 Islam 58 67 50

توصیف آماری یک دیتا فریم DataFrame در پایتون با کتابخانه Pandas

>>> print(result.describe())
            Math   English    Physics
count  10.000000  10.00000  10.000000
mean   54.800000  50.40000  58.100000
std    30.741937  29.72541  24.442449
min     0.000000  10.00000  20.000000
25%    32.500000  32.25000  43.750000
50%    63.000000  39.50000  52.500000
75%    75.250000  75.25000  75.750000
max    99.000000  95.00000  98.000000
view rawStatsDataFrame.py hosted with ❤ by GitHub

>>> print(result.describe())

Math English Physics

count 10.000000 10.00000 10.000000

mean 54.800000 50.40000 58.100000

std 30.741937 29.72541 24.442449

min 0.000000 10.00000 20.000000

25% 32.500000 32.25000 43.750000

50% 63.000000 39.50000 52.500000

75% 75.250000 75.25000 75.750000

max 99.000000 95.00000 98.000000

view rawStatsDataFrame.py hosted with ❤ by GitHub

مثال : مرتب سازی ستون بر اساس یک یا چند ویژگی در پایتون

# accessing attribute with `.` is only applicable when there is no space in the attribute name.
>>> result.Name.head(2) 
0    Nasir
1    Islam
Name: Name, dtype: object

# convenient way of accessing attribute
>>> result["Name"].head(2)
0    Nasir
1    Islam
Name: Name, dtype: object

# accessing multiple attribute
>>> result[["Name", "Math"]].head(2)
    Name  Math
0  Nasir    99
1  Islam    58

# accessing attribute with `.` is only applicable when there is no space in the attribute name.

>>> result.Name.head(2)

0 Nasir

1 Islam

Name: Name, dtype: object

# convenient way of accessing attribute

>>> result["Name"].head(2)

0 Nasir

1 Islam

Name: Name, dtype: object

# accessing multiple attribute

>>> result[["Name", "Math"]].head(2)

Name Math

0 Nasir 99

1 Islam 58

برخی از اعمال اصلی ای که با دیتا فریم DataFrame می توان انجام داد:

# delete or drop an attribute/column
>>> del result["Name"]

# Removing multiple attribute/column
# `del` can only remove a single column at a time
# for removing multiple columns we use drop() method
# `axis = 1` means column, `inplace = True` means permanent
>>> result.drop(["Math", "Physics"], axis=1, inplace=True)

"""
reload the result DataFrame
"""

# rename columns
>>> result.rename(columns={
      "Math": "Social Science", 
      "Physics" : "Biology",
      "English": "Chemistry"}, inplace=True)
      
# shape of a DataFrame
>>> result.shape
(10, 4)

# Attribute/Column names
>>> result.columns
Index(['Name', 'Social Science', 'Chemistry', 'Biology'], dtype='object')

# delete or drop an attribute/column

>>> del result["Name"]

# Removing multiple attribute/column

# `del` can only remove a single column at a time

# for removing multiple columns we use drop() method

# `axis = 1` means column, `inplace = True` means permanent

>>> result.drop(["Math", "Physics"], axis=1, inplace=True)

"""

reload the result DataFrame

"""

# rename columns

>>> result.rename(columns={

"Math": "Social Science",

"Physics" : "Biology",

"English": "Chemistry"}, inplace=True)

# shape of a DataFrame

>>> result.shape

(10, 4)

# Attribute/Column names

>>> result.columns

Index(['Name', 'Social Science', 'Chemistry', 'Biology'], dtype='object')

جستجوی شرطی در یک دیتا فریم DataFrame

# find out the names of students who have failed in Chemistry.
>>> print(result["Name"][result["Chemistry"]<33])
4    Jamal
5     Rony
9    Sumon
Name: Name, dtype: object
    
# find out the names of students who have achieved more than 60 in all subjects.
>>> result["Name"][(result["Chemistry"]>60) & (result["Biology"]>60) & (result["Social Science"]>60)]
7    Shahin
Name: Name, dtype: object

# find out the names of students who have failed in Chemistry.

>>> print(result["Name"][result["Chemistry"]<33])

4 Jamal

5 Rony

9 Sumon

Name: Name, dtype: object

# find out the names of students who have achieved more than 60 in all subjects.

>>> result["Name"][(result["Chemistry"]>60) & (result["Biology"]>60) & (result["Social Science"]>60)]

7 Shahin

Name: Name, dtype: object

کتابخانه Matplotlib در پایتون : یک کتابخانه عالی برای بصری سازی داده در پایتون میباشد.

مثال : رسم توزیع یک ویژگی در پایتون با Matplotlib

import matplotlib.pyplot as plt
%matplotlib inline

dataset = pd.read_csv("../dataset/student_result.csv")

# This line will shows the result distribution
# result attribute contains two types of value. 
# 1 indicates `pass` and `0` indicates `fail`
dataset.result.value_counts().plot.bar()

import matplotlib.pyplot as plt

%matplotlib inline

dataset = pd.read_csv("../dataset/student_result.csv")

# This line will shows the result distribution

# result attribute contains two types of value.

# 1 indicates `pass` and `0` indicates `fail`

dataset.result.value_counts().plot.bar()

مثال : رسم نمودار میله ای یا Bar Chart در پایتون

student_names = ["Jamal", 'Kamal', "Rony", "Jony", "Sumon"]
math_result = dataset["math"][:5]

plt.bar(student_names, math_result)

student_names = ["Jamal", 'Kamal', "Rony", "Jony", "Sumon"]

math_result = dataset["math"][:5]

plt.bar(student_names, math_result)

تغییر پارامترهای نمودار میله ای در پایتون

# adding title, lagend
# creating subplots with 1 row and 2 columns
# where the whole figure size width is 13 and height is 5
fig, ax = plt.subplots(1,2, figsize=(13, 5))
ax[0].bar(student_names, math_result)
ax[0].legend("M")
plt.title("Final exam math result")

# increase font size and change bar color
ax[1].bar(student_names, math_result, color = "orange",)
ax[1].legend("M")
plt.title("Final exam math result", fontsize=18)
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
plt.show()

# adding title, lagend

# creating subplots with 1 row and 2 columns

# where the whole figure size width is 13 and height is 5

fig, ax = plt.subplots(1,2, figsize=(13, 5))

ax[0].bar(student_names, math_result)

ax[0].legend("M")

plt.title("Final exam math result")

# increase font size and change bar color

ax[1].bar(student_names, math_result, color = "orange",)

ax[1].legend("M")

plt.title("Final exam math result", fontsize=18)

plt.xticks(fontsize=13)

plt.yticks(fontsize=13)

plt.show()

رسم نمودار دایره ای یا Pie Chart در پایتون

subject_names = ["Math", "Bangla" , "English", "Physics", "Chemistry"]
subject_marks = [90, 70, 33, 68, 47]

colors = ["#ffd50544", "#7952b399", "#ff222244", "#007bff44", "#262c3a44"]

plt.pie(subject_marks, labels = subject_names, autopct='%1.1f%%', startangle=90, colors=colors)
plt.axis('equal') 
plt.show()

subject_names = ["Math", "Bangla" , "English", "Physics", "Chemistry"]

subject_marks = [90, 70, 33, 68, 47]

colors = ["#ffd50544", "#7952b399", "#ff222244", "#007bff44", "#262c3a44"]

plt.pie(subject_marks, labels = subject_names, autopct='%1.1f%%', startangle=90, colors=colors)

plt.axis('equal')

plt.show()

آشنایی با کتابخانه SciKit-Learn برای یادگیری ماشین در پایتون

کتابخانه SciKit-Learn طیف وسیعی از الگوریتم های یادگیری با سرپرست ( Supervised) و بدون سرپرست Unsupervised را در اختیار شما قرار میدهد

نکته : برای اجرای کد زیر باید کتابخانه فوق را با دستور زیر روی سیستم خود نصب کنید:

pip3 install -U scikit-learn

1	pip3 install -U scikit-learn

مثال طبقه بندی داده در پایتون با کتابخانه SciKit-Learn

ما از دیتاست نتایج دانش آموزان student result استفاده خواهیم کرد که شامل دو کلاس 0 (مردود شدن دانش آموز) و 1 (قبول شدن دانش آموز) می باشد.

این دیتاست شامل یک برچسب یا کلاس می باشد که دو حالت ممکن را پوشش می دهد به این معنی که این مسئله یک مسئله طبقه‌بندی با سرپرست می باشد.

مرحله ۱ : ایمپورت کردن کتابخانه های مورد نیاز در پایتون
مرحله ۲ : لود کردن دیتا ست
مرحله ۳ : تقسیم دیتاست به دو بخش آموزش و تست
مرحله ۴ : ساخت مدل و آموزش آن با استفاده از داده آموزش
مرحله ۵ : ارزیابی مدل با استفاده از داده تست

در ادامه این 5 مرحله را در پایتون کد نویسی میکنیم:

مرحله ۱ : ایمپورت کردن کتابخانه های مورد نیاز در پایتون

import numpy as np 
import pandas as pd

# 4 Supervised Classification Learning Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import numpy as np

import pandas as pd

# 4 Supervised Classification Learning Algorithms

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

مرحله ۲ : لود کردن دیتا ست

>>> dataset = pd.read_csv(r"../dataset/student_result.csv")
>>> print(dataset.head())

   math  bangla  english  result
0    70      80       90       1
1    30      40       50       0
2    50      20       35       0
3    80      33       33       1
4    33      35       36       1

>>> dataset = pd.read_csv(r"../dataset/student_result.csv")

>>> print(dataset.head())

math bangla english result

0 70 80 90 1

1 30 40 50 0

2 50 20 35 0

3 80 33 33 1

4 33 35 36 1

مرحله ۳ : تقسیم دیتاست به دو بخش آموزش و تست

X = dataset.drop("result", axis=1) # X contains all the features
y = dataset["result"] # y contains only the label

# X_train contains features for training, X_test contains features for testing
# test_size = 0.3 means 30% data for testing
# random_state = 1, is the seed value used by the random number generator
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

X = dataset.drop("result", axis=1) # X contains all the features

y = dataset["result"] # y contains only the label

# X_train contains features for training, X_test contains features for testing

# test_size = 0.3 means 30% data for testing

# random_state = 1, is the seed value used by the random number generator

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

مرحله ۴ : ساخت مدل و آموزش آن با استفاده از داده آموزش

clf_lr = LogisticRegression()

# fit the dataset into LogisticRegression Classifier
clf_lr.fit(X_train, y_train)
# predict on the unseen data
pred_lr = clf_lr.predict(X_test)

clf_knn = KNeighborsClassifier()
pred_knn = clf_knn.fit(X_train, y_train).predict(X_test) # method chainning

clf_rf = RandomForestClassifier(random_state=1)
pred_rf = clf_rf.fit(X_train, y_train).predict(X_test)

clf_dt = DecisionTreeClassifier()
pred_dt = clf_dt.fit(X_train, y_train).predict(X_test)

clf_lr = LogisticRegression()

# fit the dataset into LogisticRegression Classifier

clf_lr.fit(X_train, y_train)

# predict on the unseen data

pred_lr = clf_lr.predict(X_test)

clf_knn = KNeighborsClassifier()

pred_knn = clf_knn.fit(X_train, y_train).predict(X_test) # method chainning

clf_rf = RandomForestClassifier(random_state=1)

pred_rf = clf_rf.fit(X_train, y_train).predict(X_test)

clf_dt = DecisionTreeClassifier()

pred_dt = clf_dt.fit(X_train, y_train).predict(X_test)

مرحله ۵ : ارزیابی مدل با استفاده از داده تست

کتابخانه NLTK برای پردازش زبان طبیعی (NLP) در پایتون {NLP =Natural Langue Processing}

برای پردازش زبان طبیعی یکی از کارهایی که باید انجام بشود ساخت توکن ها یا Tokenization می باشد.
ساخت توکن به معنی تقسیم کردن یک رشته به قسمت های کوچک تری مانند کلمه نماد اعداد و غیره می باشد.

مثال : تقسیم یک جمله به کلمات : یا جدا سازی کلمات از جمله در پایتون (ساخت توکن) :

>>> from nltk.tokenize import word_tokenize
>>> sentence = "Hello! My Name is Nasir Islam Sujan."

# word_tokenize method will split the sentence into many token/pieces. 
>>> word_tokenize(sentence)
['Hello', '!', 'My', 'Name', 'is', 'Nasir', 'Islam', 'Sujan', '.']

>>> from nltk.tokenize import word_tokenize

>>> sentence = "Hello! My Name is Nasir Islam Sujan."

# word_tokenize method will split the sentence into many token/pieces.

>>> word_tokenize(sentence)

['Hello', '!', 'My', 'Name', 'is', 'Nasir', 'Islam', 'Sujan', '.']

تقسیم یک پاراگراف به کلمات : یا جدا سازی توکن ها از پاراگراف در پایتون (ساخت توکن) :

منبع

وبلاگ

در این مقاله ۵ کتابخانه یادگیری ماشین که در پایتون به صورت گسترده توسط افراد متخصص استفاده میشود را برای شما معرفی خواهیم کرد پس با من همراه باشید.

5 کتابخانه مطرح یادگیری ماشین با پایتون

کتابخانه Numpy در پایتون

کتابخانه Pandas در پایتون برای یادگیری ماشین:

مثال : ساخت یک دیتا فریم یا DataFrame در پایتون :

توصیف آماری یک دیتا فریم DataFrame در پایتون با کتابخانه Pandas

برخی از اعمال اصلی ای که با دیتا فریم DataFrame می توان انجام داد:

جستجوی شرطی در یک دیتا فریم DataFrame

کتابخانه Matplotlib در پایتون : یک کتابخانه عالی برای بصری سازی داده در پایتون میباشد.

مثال : رسم توزیع یک ویژگی در پایتون با Matplotlib

مثال : رسم نمودار میله ای یا Bar Chart در پایتون

رسم نمودار دایره ای یا Pie Chart در پایتون

آشنایی با کتابخانه SciKit-Learn برای یادگیری ماشین در پایتون

کتابخانه SciKit-Learn طیف وسیعی از الگوریتم های یادگیری با سرپرست ( Supervised) و بدون سرپرست Unsupervised را در اختیار شما قرار میدهد

مثال طبقه بندی داده در پایتون با کتابخانه SciKit-Learn

کتابخانه NLTK برای پردازش زبان طبیعی (NLP) در پایتون {NLP =Natural Langue Processing}

مثال : تقسیم یک جمله به کلمات : یا جدا سازی کلمات از جمله در پایتون (ساخت توکن) :

تقسیم یک پاراگراف به کلمات : یا جدا سازی توکن ها از پاراگراف در پایتون (ساخت توکن) :

دیدگاهتان را بنویسید لغو پاسخ