Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть Python Tutorial: Basic feature extraction

  • DataCamp
  • 2020-04-06
  • 2708
Python Tutorial: Basic feature extraction
PythonNLPEngineeringDataCampextractionPythonTutorial
  • ok logo

Скачать Python Tutorial: Basic feature extraction бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Python Tutorial: Basic feature extraction или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку Python Tutorial: Basic feature extraction бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Python Tutorial: Basic feature extraction

Want to learn more? Take the full course at https://learn.datacamp.com/courses/fe... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---
In this video, we will learn to extract certain basic features from text. While not very powerful, they can give us a good idea of the text we are dealing with.

The most basic feature we can extract from text is the number of characters, including whitespaces. For instance, the string "I don't know."

has 13 characters. The number of characters is the length of the string. Python gives us a built-in len() function

which returns the length of the string passed into it. The output will be

13 here too. If our dataframe df has a textual feature (say 'review'), we can compute the number of characters for each review and store it as a new feature 'num_chars' by using the pandas dataframe apply method. This is done by creating df['num_chars']

and assigning it to df['review'].apply(len).

Another feature we can compute is the number of words. Assuming that every word is separated by a space, we can use a string's split() method to convert it into a list where every element is a word.

In this example, the string Mary had a little lamb is split to create a list

containing the words Mary, had, a, little and lamb.

We can now compute the number of words by computing the number of elements in this list

using len().

To do this for a textual feature in a dataframe, we first define a function

that takes in a string as an argument and returns the number of words in it. The steps followed inside the function are similar as before. We then pass this function word_count into apply. We create df['num_words']

and assign it to df['review'].apply(word_count).

Let's now compute the average length of words in a string. Let's define a function avg_word_length()

which takes in a string and returns the average word length. We first split the string

into words and compute

the length of each word. Next, we compute the average word length

by dividing the sum of the lengths of all words by the number of words.

We can now pass this into apply()

to generate a average word length feature like before.

When working with data such as tweets, it maybe useful to compute the number of hashtags or mentions used. This tweet by DataCamp,

for instance, has one mention upendra_35 which begins with an @ and two hashtags, PySpark and Spark which begin with a #.

Let's write a function

that computes the number of hashtags in a string. We split the

string into words. We then use list comprehension

to create a list containing only those words that are hashtags. We do this using the startswith method of strings to find out if a word begins with #. The final step

is to return the number of elements in this list using len. The procedure to compute number of mentions is identical except that we check if a word starts with @. Let's see this function in action. When we pass a string

"@janedoe This is my first tweet! #FirstTweet #Happy", the function returns 2

which is indeed the number of hashtags in the string.

There are other basic features we can compute such as number of sentences,

number of paragraphs,

number of words starting with an uppercase,

all-capital words,

numeric quantities

etc. The procedure to do this is extremely similar to the ones we've already covered.

That's enough theory for now. Let's practice!

#PythonTutorial #DataCamp #Engineering #NLP #Python #extraction

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]