博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
bigquery使用教程_如何使用Python和Google BigQuery构建机器人以自动执行您的笨拙任务...
阅读量:2522 次
发布时间:2019-05-11

本文共 11347 字,大约阅读时间需要 37 分钟。

bigquery使用教程

Do you have repetitive tasks? Something that you do regularly, every week or even every day? Reporting might be one of your weekly or daily tasks. You query or ask for the data, and do some visualizations, then give it to your boss. What if, instead of doing it manually, you were to automate it so you don’t have to do the boring stuff, and you can use your precious time to do other things?

您有重复的任务吗? 您每周,甚至每天都定期做些什么? 报告可能是您每周或每天的任务之一。 您查询或索要数据,并进行一些可视化处理,然后将其提供给老板。 如果不是要手动执行操作,而是要自动执行操作,这样就不必做无聊的事情,而您可以利用宝贵的时间做其他事情怎么办?

In this tutorial, we are going to make a Telegram Bot that will automate the boring parts of your job — reporting. Oh, and did I mention that it won’t take more than 50 lines of code to build it? ;)

在本教程中,我们将制作一个Telegram Bot,它将自动执行您工作中无聊的部分-报告。 哦,我是否提到过构建它不会花费超过50行代码? ;)

If this is your first time building a Telegram Bot, you might want to read first.

如果这是您第一次构建Telegram Bot,则可能需要先阅读 。

入门 (Getting started)

1.安装库 (1. Install the libraries)

We are going to use to query the data from Google BigQuery. , and will help us with the data visualization. will send the visualization image through Telegram Chat.

我们将使用从Google BigQuery查询数据。 , 和将帮助我们进行数据可视化。 将通过Telegram Chat发送可视化图像。

pip3 install google-cloud-bigquery matplotlib numpy pandas python-telegram-bot

2.启用Google BigQuery API (2. Enable Google BigQuery API)

We need to enable the Google BigQuery API first if we want to use the service.

如果要使用该服务,我们需要先启用Google BigQuery API。

Go to and create a new project (or select the one you have).

转到并创建一个新项目(或选择您拥有的项目)。

In the project dashboard, click ENABLE APIS AND SERVICES, and search for BigQuery API.

在项目仪表板中,单击“ 启用 API 和服务” ,然后搜索BigQuery API。

Click ENABLE to enable the API.

单击启用以启用API。

3.创建服务帐户密钥 (
3. Create the service account key)

If we want to use Google Cloud services like Google BigQuery, we need a service account key. This is like our credentials to use Google’s services.

如果我们要使用Google BigQuery等Google Cloud服务,则需要一个服务帐户密钥。 就像我们使用Google服务的凭据一样。

Go to , click the Credentials tab, choose Create credentials and click Service account key.

转到 ,单击“ 凭据”标签,选择“ 创建凭据” ,然后单击“ 服务帐户密钥”

Choose New service account, in the Service account name field, enter a name.

选择“ 新服务帐户” ,在“ 服务帐户名称”字段中输入名称。

From the Role drop-down list, select Project > Owner, then click Create.

从“ 角色”下拉列表中,选择“ 项目” >“ 所有者”,然后单击“ 创建”

There is a .json file that will be automatically downloaded, name it creds.json.

有一个.json文件将被自动下载,命名为creds.json

Set the GOOGLE_APPLICATION_CREDENTIALS with the path of our creds.json file in the terminal.

在终端中使用我们creds.json文件的路径设置GOOGLE_APPLICATION_CREDENTIALS

export GOOGLE_APPLICATION_CREDENTIALS='[PATH_TO_CREDS.JSON]'

Everything should be good now, it is time to write our program.

现在一切都应该很好,是时候编写我们的程序了。

编写程序 (Write the program)

We are going to write the program that will query the data from BigQuery (we assume the data is stored there). Then we will visualize the data and save as an image. The image will then be sent through Telegram Chat.

我们将编写一个程序,该程序将从BigQuery查询数据(假设数据存储在此处)。 然后,我们将数据可视化并另存为图像。 然后将通过电报聊天发送图像。

For this tutorial, we are using the bigquery-public-data.stackoverflow dataset, and we will take the daily total posts data for our report.

在本教程中,我们使用bigquery-public-data.stackoverflow数据集,我们将获取报告的每日总帖子数据。

The workflow of our program is pretty simple:

我们程序的工作流程非常简单:

Query the table -> Visualize the data -> Save the visualization -> Send the image

查询表-> 可视化数据-> 保存可视化-> 发送图像

Let’s make a single function to define each flow.

让我们做一个定义每个流程的函数。

1.查询到BigQuery (1. Query to BigQuery)

Import the library first.from google.cloud import bigquery

首先从google.cloud导入库。导入bigquery

Make a function called query_to_bigquery which takes query as the parameter.

创建一个名为query_to_bigquery的函数,该函数将query作为参数。

def query_to_bigquery(query):    client = bigquery.Client()        query_job = client.query(query)        result = query_job.result()        dataframe = result.to_dataframe()        return dataframe

This function will return the data as a dataframe.

此函数将数据作为数据帧返回。

2.可视化数据 (2. Visualize the data)

We are going to use matplotlib to visualize the data.

我们将使用matplotlib可视化数据。

import matplotlib.pyplot as plt

We take five parameters which are x as the x-axis data, x_label as the x-axis label name, y as the y-axis data, y_label as the y-axis label name, and title as our visualization title.

我们使用五个参数,其中x作为x轴数据, x_label作为x轴标签名称, y作为y轴数据, y_label作为y轴标签名称,以及title作为我们的可视化标题。

def visualize_bar_chart(x, x_label, y, y_label, title):    plt.title(title)        plt.xlabel(x_label)        plt.ylabel(y_label)        index = np.arange(len(x))        plt.xticks(index, x, fontsize=5, rotation=30)        plt.bar(index, y)        return plt

3.保存图像 (3. Save the image)

Let’s use the two functions above to create a visualization then save the image.

让我们使用上面的两个函数创建可视化文件,然后保存图像。

Like I mentioned before, we want to send the daily total posts data. Write the query first.

就像我之前提到的,我们要发送每日总帖子数据。 首先编写查询。

query = """             SELECT DATE(creation_date) date, COUNT(*) total_posts            FROM `bigquery-public-data.stackoverflow.post_history`            GROUP BY 1            HAVING date > DATE_SUB('2018-12-02', INTERVAL 14 DAY)            ORDER BY 1        """

Note that in the query above, HAVING date > DATE_SUB('2018-12-02', INTERVAL 14 DAY) means we want to gather the data starting 14 days ago from 2018–12–02.

请注意,在上面的查询中, HAVING date > DATE_SUB('2018-12-02', INTERVAL 14 DAY)意味着我们要从14-12天开始从2018-12-02开始收集数据。

We use that date because 2018-12-02 is the last data recorded in bigquery-public-data.stackoverflow.post_history, in different cases you might want to use CURRENT_DATE() instead so you will get the newest data.

我们使用该日期,因为2018-12-02bigquery-public-data.stackoverflow.post_history记录的最后一个数据,在不同情况下,您可能希望使用CURRENT_DATE()来获取最新数据。

Call query_to_bigquery function to get the data.

调用query_to_bigquery函数以获取数据。

dataframe = query_to_bigquery(query)

Take the date column as our x-axis data, and total_posts column as our y-axis data.

date列作为我们的x轴数据,并将total_posts列作为我们的y轴数据。

x = dataframe['date'].tolist()y = dataframe['total_posts'].tolist()

Visualize the data using the visualize_bar_chart function, then save it as an image.

使用visualize_bar_chart函数可视化数据,然后将其另存为图像。

plt = visualize_bar_chart(x=x, 			  x_label='Date',                           y=y,                           y_label='Total Posts',                           title='Daily Posts')plt.savefig('viz.png')

Wrap that code in a function called get_and_save_image.

将该代码包装在一个名为get_and_save_image的函数中。

def get_and_save_image():    	query = """                 SELECT DATE(creation_date) date, COUNT(*) total_posts                FROM `bigquery-public-data.stackoverflow.post_history`                GROUP BY 1                HAVING date > DATE_SUB('2018-12-02', INTERVAL 14 DAY)                ORDER BY 1                """        dataframe = query_to_bigquery(query)        x = dataframe['date'].tolist()        y = dataframe['total_posts'].tolist()        plt = visualize_bar_chart(x=x, 			  	  x_label='Date',                           	  y=y,                           	  y_label='Total Posts',                           	  title='Daily Posts')        plt.savefig('viz.png')

4.发送图片 (4. Send the image)

To be able to send it to the right person, we need to know their chat_idbecause that is one of the required parameters.

为了将其发送给合适的人,我们需要知道他们的chat_id因为这是必需的参数之一。

Go to the then type /start. The bot will reply with our user information, and our chat_id is the number in the Id field.

转到然后键入/start 。 该漫游器将使用我们的用户信息进行回复,而我们的chat_idId字段中的数字。

Make a function called send_image. This function will call get_and_save_image function first to get and save the visualization, then send it to the person whose chat_id is declared in the chat_id variable.

创建一个名为send_image的函数。 此函数将首先调用get_and_save_image函数以获取并保存可视化,然后将其发送给在chat_id变量中声明chat_id

def send_image(bot, update):    get_and_save_image()    chat_id = 'CHAT_ID_RECEIVER'    bot.send_photo(chat_id=chat_id, photo=open('viz.png','rb'))

5.主程序 (5. Main program)

Lastly, create another function called main to run our program. Don’t forget to change YOUR_TOKEN with your bot’s token.

最后,创建另一个名为main函数以运行我们的程序。 别忘了用您的机器人令牌更改 YOUR_TOKEN

Remember, this program will send the image automatically based on the day and time we defined.

请记住,该程序将根据我们定义的日期和时间自动发送图像。

For example in this tutorial we will set it to 9:00 AM every day.

例如,在本教程中,我们将其设置为每天9:00 AM。

def main():    updater = Updater('YOUR_TOKEN')    updater.job_queue.run_daily(send_image, time=datetime.datetime.strptime('9:00AM', '%I:%M%p').time(),days=(0,1,2,3,4,5,6))    updater.start_polling()    updater.idle()  if __name__ == '__main__':    main()

At the end your code should look like this:

最后,您的代码应如下所示:

from google.cloud import bigqueryfrom telegram.ext import Updaterimport matplotlib.pyplot as pltimport numpy as npimport datetimedef query_to_bigquery(query):    client = bigquery.Client()    query_job = client.query(query)    result = query_job.result()    dataframe = result.to_dataframe()    return dataframedef visualize_bar_chart(x, x_label, y, y_label, title):    plt.title(title)    plt.xlabel(x_label)    plt.ylabel(y_label)    index = np.arange(len(x))    plt.xticks(index, x, fontsize=5, rotation=30)    plt.bar(index, y)    return pltdef get_and_save_image():    query = """             SELECT DATE(creation_date) date, COUNT(*) total_posts            FROM `bigquery-public-data.stackoverflow.post_history`            GROUP BY 1            HAVING date > DATE_SUB('2018-12-02', INTERVAL 14 DAY)            ORDER BY 1            """    dataframe = query_to_bigquery(query)       x = dataframe['date'].tolist()    y = dataframe['total_posts'].tolist()    plt = visualize_bar_chart(x=x, x_label='Date', y=y, y_label='Total Posts', title='Daily Posts')    plt.savefig('viz.png')def send_image(bot, update):    get_and_save_image()    chat_id = 'CHAT_ID_RECEIVER'    bot.send_photo(chat_id=chat_id, photo=open('viz.png', 'rb'))def main():    updater = Updater('YOUR_TOKEN')    updater.job_queue.run_daily(send_image, time=datetime.datetime.strptime('9:00AM', '%I:%M%p').time(), days=(0,1,2,3,4,5,6))    updater.start_polling()    updater.idle()if __name__ == '__main__':    main()

Save the file and name it main.py.

保存文件并将其命名为main.py

Run the program by typing this command in the terminal.python3 main.py

通过在terminal.python3 main.py中键入以下命令来运行程序

Great! Now you have an automatic report generator built with no more than 50 lines of code — pretty cool right?

大! 现在,您已经建立了一个自动报表生成器,该生成器的代码不超过50行-很酷吧?

Go check the bot in , and type the /send command to see the example of the image visualization.

在检查机器人,然后输入/send命令以查看图像可视化示例。

The image below shows the visualization that the bot will send. Now you can just sit back, relax, and wait for the bot to send the report to you everyday :)

下图显示了漫游器将发送的可视化图像。 现在,您可以坐下来放松一下,然后等待机器人每天将报告发送给您:)

You can visit my to get the code, and please do not hesitate to connect and leave a message in my profile if you want to ask about anything.

您可以访问我的以获得代码,如果您有任何疑问,请随时联系并在个人资料中留言。

Please leave a comment if you think there are any errors in my code or writing.

如果您认为我的代码或写作有任何错误,请发表评论。

If you have interest in data science or machine learning, you might want to read my post on .

如果您对数据科学或机器学习感兴趣,则可能需要阅读我的关于 。

Once again, thank you and good luck! :)

再次感谢您,祝您好运! :)

翻译自:

bigquery使用教程

转载地址:http://htuzd.baihongyu.com/

你可能感兴趣的文章
SCALA STEP BY STEP
查看>>
cocos2d-x学习笔记
查看>>
MySql中的变量定义
查看>>
Ruby数组的操作
查看>>
hdu1181暴搜
查看>>
解码字符串 Decode String
查看>>
json学习笔记
查看>>
工具:linux 性能监控工具-nmon
查看>>
fatal error C1853
查看>>
Ural 1001 - Reverse Root
查看>>
玩转webpack之webpack的entry output
查看>>
java 操作mongodb查询条件的常用设置
查看>>
黑马程序员_java基础笔记(02)...java语言基础组成
查看>>
对innodb 拷贝文件实现数据库的方式(转)
查看>>
python知识点 2014-07-09
查看>>
FloatingActionButton的一点学习感悟
查看>>
ABAP CDS ON HANA-(10)項目結合して一つ項目として表示
查看>>
网站地址信息
查看>>
产品经理 - 登录 注册
查看>>
小白的python进阶历程------05.占位符
查看>>