Fun With Data

View Original

How to use the IMPORT functions in Google Sheets

What are Google Sheets Import Functions?

In a perfect world, all the data you need for a project will be neatly organized and packaged in a single file and the format you need. However, in the real world, this is often not the case. Oftentimes, you'll find yourself in a situation where you need to combine information from multiple sources or scrape data from various sources to create your own spreadsheet or CSV file. Thankfully, Google Sheets Import Functions come to the rescue, providing a hassle-free way for users to import data from a wide range of sources. These functions offer flexibility and convenience, allowing you to gather data from online sources, including other Google Sheets, with ease.

Let's explore the five Import Functions in Google Sheets that can help you bring in data from online sources. These functions include (click the links to jump to them):

  • IMPORTHTML: This function allows you to import data from HTML tables or lists.

  • IMPORTDATA: With this function, you can import data from a CSV or TSV file.

  • IMPORTXML: If you need to import data from an XML file, this function will come in handy.

  • IMPORTRANGE: This function allows you to import data from a specified range in another Google Sheets document.

  • IMPORTFEED: If you want to import data from an RSS or Atom feed, this function is the one to use.


See this content in the original post

What it does: IMPORTHTML is designed to extract tables or lists from HTML web pages. You provide the URL and specify the table's index or the list's position on the page.

How it works: The function looks something like this =IMPORTHTML(“url”, “query”, index). For this to work, the URL must appear between quotation marks and the same goes for the query. The types of queries accepted by this function are table and list. As for the index, both tables and lists start at 1. So if the web page has tables and lists, it’s possible to have an index of 1 for each of them. If you don’t need the entire table, you can import data from specific columns. All you need to do is wrap the IMPORTHTML function within an index function and include the columns and rows you want to reference. It looks like this =Index(IMPORTHTML(“url”, “query”, index), 3,2).

Why you’ll want to use it: This function simplifies the process of importing data from web pages, allowing users to effortlessly integrate data from external sources into their spreadsheets. It's particularly useful for web scraping and monitoring changes on websites.

What it can’t do well: IMPORTHTML may not always work perfectly with complex web page structures or sites that frequently change their layout. It's sensitive to any adjustments in the source HTML, which can break the import.

Try it yourself: Open up Google Sheets and try =IMPORTHTML("https://en.wikipedia.org/wiki/List_of_tallest_buildings", "table", 2). This function will return a table of the world’s tallest buildings found on this Wikipedia page.

Or try =Index(IMPORTHTML("https://en.wikipedia.org/wiki/List_of_tallest_buildings", "table", 2)3, 2). This will pull in the info found in a specific cell. In this case it’ll be the string or number found in column 3 on row 2 of the table above.

See this content in the original post

What it does: The IMPORTDATA function is used to retrieve data from a CSV or TSV file and format the information within your spreadsheet.

How it works: The function looks like this =IMPORTDATA(“url”). The function only has one parameter, the URL of the file you want to import data from. Make sure the URL is wrapped in quotation marks and include the protocol.

Why you’ll want to use it: The primary advantage of IMPORTDATA is its simplicity. It allows you to quickly import data from online sources without the need for complex configurations. This is invaluable when you need to keep your spreadsheet up-to-date with live data, like stock prices or weather information.

Its limitations: IMPORTDATA can only pull data from publicly accessible URLs. Additionally, it does not support authentication or complex API calls, limiting its use in cases that require advanced data manipulation.

Try it yourself: Open up Google Sheets and try =importdata("https://data.ontario.ca/dataset/87f52cc1-1576-4c88-bb1d-07aea73148c4/resource/4f4b2846-a36a-4e79-8429-9831c826fe2e/download/list_of_all_fia_registered_en_v3.csv") This will pull in data from a dataset that contains a list of registered farm implement dealers, distributors and dealer/distributors in Ontario from the Ontario Data Catalogue.


See this content in the original post

What it does: IMPORTXML enables users to import data from XML documents, such as RSS feeds, APIs, or custom XML files. You provide the URL and an XPath query to specify the data you want to retrieve.

How it works: The IMPORTXML function has three inputs - URL, XPath Query and locale. The first two must be included. If the locale is left unspecified, the document locale will be used. The function looks like this: =importxml(“url”, xpath_query, locale). The URL must be enclosed in quotation marks. 

If you’re like us and not very familiar with XML files or XPath or you’d like a refresher, there is a great overview of both on W3Schools. Here are the links: XML Overview  | XPath Overview

Why you’ll want to use it: The versatility of IMPORTXML allows you to access structured data from various XML sources. It's particularly valuable for extracting specific pieces of information from XML documents, making it indispensable for data extraction projects.

Its limitations: Like IMPORTHTML, IMPORTXML is sensitive to changes in the source XML structure. Additionally, it might not work with highly complex or dynamic XML sources.

Try it yourself: Open up Google Sheets and try =IMPORTXML("https://en.wikipedia.org/wiki/Moon_landing", "//a/@href") The function will retrieve and return the href attribute values of all the <a> tags in the HTML page located at https://en.wikipedia.org/wiki/Moon_landing, effectively providing a list of all the links present on that webpage.

See this content in the original post

What is does: IMPORTRANGE is used to import data from one Google Sheets document to another. You provide the source spreadsheet's URL and specify the range of cells to import. It’s useful when you need to blend data from multiple spreadsheets or want to keep a master file of the data as is.

How it works: The function looks like this =IMPORTRANGE(“spreadsheet_url” , “rangestring“). The first parameter is the URL of the spreadsheet. The second is the data you want to copy over. It must be written in the following format: “Name of the sheet!cells” For example, if your sheet is called Sheet1 and the data you want to copy is found in cells B2 to B9 then the formula would look like =IMPORTRANGE(“http//googlesheets.com”, Sheet1!B2:B9)

Why you’ll want to use it: Collaborative projects benefit greatly from IMPORTRANGE since it allows for real-time data sharing between Google Sheets documents. It ensures that everyone has access to the most up-to-date information, making it easier to collaborate and make informed decisions.

Its limitations: The main limitation of IMPORTRANGE is that it can only import data from Google Sheets documents that you have access to. Sharing permissions must be set up correctly, and the data source must be online.

See this content in the original post

What it does: The IMPORTFEED function is used to import RSS or Atom feeds from websites. You provide the feed URL, and the function retrieves the latest feed entries.

How it works: The function looks like this =IMPORTFEED(URL, query, headers, num_items). Similar to the other functions, the URL must be in quote marks and it must include the protocol in the address. The URL is the only mandatory parameter, the others are optional. The query parameter defines the type of information you want to get from the feed. You can query the following:

  • items will return a table with all the articles published to the feed. It will look like this: =importfeed("https://feeds.npr.org/1002/rss.xml", "items",true)

  • items <type> will return a list from a specific column. For example =importfeed("https://feeds.npr.org/1002/rss.xml", "items title",true) will provide you with a list of the titles of the articles on the NPR RSS feed. You can change items to author to get the names of the authors who have published to the feed, or to URL to get links to the specific stories that have been published to the feed.

  • feed will return a row with details about the feed including URL, title, and description.

  • feed <type> will return a specific attribute of the feed such as the description, author, title, or URL.

The headers parameter specifies whether you want to have headers. It is defaulted to false. If you want headers to show, you’ll have to add true to the function. Lastly, you can use the num_items parameter to specify the number of items in the feed.

Why you’ll want to use it: This function is ideal for keeping up with news updates, blog posts, or other regularly updated content from websites. It automates the process of fetching new data, ensuring your spreadsheet is always current.

Its limitations: IMPORTFEED can be limited by the frequency of updates in the feed. Some feeds may not update frequently, and others may have restrictions on how frequently you can fetch data.

Try it yourself: Open up Google Sheets and try =importfeed("https://feeds.npr.org/1002/rss.xml", "items",true). You should get a table of the most recent articles published in the NPR RSS Feed that include the title, author, URL, date published, and a summary.

In conclusion, Google Sheets Import Functions are powerful tools that significantly enhance the capabilities of your spreadsheets. Understanding when and how to use each function can simplify data management tasks and streamline your workflow. However, it's essential to be aware of their limitations to ensure smooth data imports and avoid unexpected issues. By harnessing the potential of these functions, you can master data manipulation in Google Sheets and make better-informed decisions in your projects.

Looking for more Google Sheets functions? Check out the full list here.

You may also like

See this content in the original post