Android-x86
Fork
Donation

  • R/O
  • HTTP
  • SSH
  • HTTPS

kernel: Commit

kernel


Commit MetaInfo

Revision9ec9344a69c00f09a71ec855bfac5e7853919c9a (tree)
Time2019-06-13 01:06:19
AuthorJP Abgrall <jpa@goog...>
CommiterChih-Wei Huang

Log Message

ANDROID: netfilter: xt_qtaguid: add qtaguid matching module

This module allows tracking stats at the socket level for given UIDs.
It replaces xt_owner.
If the --uid-owner is not specified, it will just count stats based on
who the skb belongs to. This will even happen on incoming skbs as it
looks into the skb via xt_socket magic to see who owns it.
If an skb is lost, it will be assigned to uid=0.

To control what sockets of what UIDs are tagged by what, one uses:

echo t $sock_fd $accounting_tag $the_billed_uid \
> /proc/net/xt_qtaguid/ctrl
So whenever an skb belongs to a sock_fd, it will be accounted against
$the_billed_uid
and matching stats will show up under the uid with the given
$accounting_tag.

Because the number of allocations for the stats structs is not that big:

~500 apps * 32 per app

we'll just do it atomic. This avoids walking lists many times, and
the fancy worker thread handling. Slabs will grow when needed later.

It use netdevice and inetaddr notifications instead of hooks in the core dev
code to track when a device comes and goes. This removes the need for
exposed iface_stat.h.

Put procfs dirs in /proc/net/xt_qtaguid/

ctrl
stats
iface_stat/<iface>/...

The uid stats are obtainable in ./stats.

Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919
Signed-off-by: JP Abgrall <jpa@google.com>

[AmitP: Folded following android-4.9 commit changes into this patch

e5d798684a71 ("ANDROID: netfilter: qtaguid: initialize a local var to keep compiler happy")]

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: netfilter: xt_qtaguid: fix ipv6 protocol lookup

When updating the stats for a given uid it would incorrectly assume
IPV4 and pick up the wrong protocol when IPV6.

Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5
Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: start tracking iface rx/tx at low level

qtaguid tracks the device stats by monitoring when it goes up and down,
then it gets the dev_stats().
But devs don't correctly report stats (either they don't count headers
symmetrically between rx/tx, or they count internal control messages).

Now qtaguid counts the rx/tx bytes/packets during raw:prerouting and
mangle:postrouting (nat is not available in ipv6).

The results are in

/proc/net/xt_qtaguid/iface_stat_fmt

which outputs a format line (bash expansion):

ifname total_skb_{rx,tx}_{bytes,packets}

Added event counters for pre/post handling.
Added extra ctrl_*() pid/uid debugging.

Change-Id: Id84345d544ad1dd5f63e3842cab229e71d339297
Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: report only uid tags to non-privileged processes

In the past, a process could only see its own stats (uid-based summary,
and details).
Now we allow any process to see other UIDs uid-based stats, but still
hide the detailed stats.

Change-Id: I7666961ed244ac1d9359c339b048799e5db9facc
Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: fix error exit that would keep a spinlock.

qtudev_open() could return with a uid_tag_data_tree_lock held
when an kzalloc(..., GFP_ATOMIC) would fail.
Very unlikely to get triggered AND survive the mayhem of running out of mem.

Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: Don't BUG_ON if create_if_tag_stat fails

If create_if_tag_stat fails to allocate memory (GFP_ATOMIC) the
following will happen:

qtaguid: iface_stat: tag stat alloc failed
...
kernel BUG at xt_qtaguid.c:1482!

Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>

ANDROID: netfilter: xt_qtaguid: remove AID_* dependency for access control

qtaguid limits what can be done with /ctrl and /stats based on group
membership.
This changes removes AID_NET_BW_STATS and AID_NET_BW_ACCT, and picks
up the groups from the gid of the matching proc entry files.

Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: I42e477adde78a12ed5eb58fbc0b277cdaadb6f94

ANDROID: netfilter: xt_qtaguid: extend iface stat to report protocols

In the past the iface_stat_fmt would only show global bytes/packets
for the skb-based numbers.
For stall detection in userspace, distinguishing tcp vs other protocols
makes it easier.
Now we report

ifname total_skb_rx_bytes total_skb_rx_packets total_skb_tx_bytes
total_skb_tx_packets {rx,tx}_{tcp,udp,ohter}_{bytes,packets}

Bug: 6818637
Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: Allow tracking loopback

In the past it would always ignore interfaces with loopback addresses.
Now we just treat them like any other.
This also helps with writing tests that check for the presence
of the qtaguid module.

Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: rate limit some of the printks

Some of the printks are in the packet handling path.
We now ratelimit the very unlikely errors to avoid
kmsg spamming.

Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: 3.10 fixes

Stop using obsolete procfs api.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

[AmitP: Folded following android-4.9 commit changes into this patch

564729173b12 ("netfilter: xt_qtaguid: fix memory leak in seq_file handlers")
85a2eb5b48fc ("ANDROID: netfilter: xt_qtaguid: 64-bit warning fixes")]

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: netfilter: xt_qtaguid: fix bad tcp_time_wait sock handling

Since (41063e9 ipv4: Early TCP socket demux), skb's can have an sk which
is not a struct sock but the smaller struct inet_timewait_sock without an
sk->sk_socket. Now we bypass sk_state == TCP_TIME_WAIT

Signed-off-by: JP Abgrall <jpa@google.com>

ANDROID: netfilter: xt_qtaguid: Fix boot panic

We need the change below because of mainline commit 351638e7de (net: pass
info struct via netdevice notifier). Otherwise we panic.

Change-Id: I7daf7513a733933fdcbaeebea7f8191f8b6a0432
Signed-off-by: John Stultz <john.stultz@linaro.org>

ANDROID: netfilter: xt_qtaguid/xt_socket: Build fixups

Fix up build kuid/kguid build issues in netfilter code.

Also re-add the xt_socket_get/put_sk interfaces needed by xt_qtaguid.

Change-Id: I7027fb840e109785bddffe8ea717b8d018b26d82
Signed-off-by: John Stultz <john.stultz@linaro.org>

[AmitP: Folded following android-4.9 commit changes into this patch

da5ea99a74f2 ("ANDROID: netfilter: xt_qtaguid: fix seq_printf type mismatch warning")
070eff8f023c ("ANDROID: netfilter: xt_qtaguid: fix broken uid/gid range check")]
2879b6ec24ee ("ANDROID: xt_qtaguid: use sock_gen_put() instead of xt_socket_put_sk()")]

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: netfilter: xt_qtaguid: Use sk_callback_lock read locks before reading sk->sk_socket

It prevents a kernel panic when accessing sk->sk_socket fields due to NULLing sk->sk_socket when sock_orphan is called through
sk_common_release.

Change-Id: I4aa46b4e2d8600e4d4ef8dcdd363aa4e6e5f8433
Signed-off-by: Mohamad Ayyash <mkayyash@google.com>
(cherry picked from commit cdea0ebcb8bcfe57688f6cb692b49e550ebd9796)
Signed-off-by: John Stultz <john.stultz@linaro.org>

ANDROID: netfilter: xt_qtaguid: xt_socket: build fixes

Add missing header <linux/miscdevice.h> and use
xt_socket_lookup_slow_v* instead of xt_socket_get*_sk
in xt_qtaguid.c.

Fix xt_socket_lookup_slow_v* functions in xt_socket.c
and declare them in xt_socket.h

Change-Id: I55819b2d4ffa82a2be20995c87d28fb5cc77b5ba
Signed-off-by: John Stultz <john.stultz@linaro.org>

[AmitP: Upstream commit 8db4c5be88f6 ("netfilter: move socket lookup

infrastructure to nf_socket_ipv{4,6}.c")] moved socket lookup
to nf_socket_ipv{4,6}.c, hence use nf_sk_lookup_slow_v[4|6]()
instead of obsolete xt_socket_lookup_slow_v[4|6]().
Also folded following android-4.9 commit changes into this patch
7de1bb86dc5a ("ANDROID: netfilter: xt_qtaguid/socket: build fixes for 4.4")
5b5ab94817f9 ("ANDROID: netfilter: xt_qtaguid: seq_printf fixes")]

Signed-off-by: Amit Pundir <amit.pundir@linaro.org>

ANDROID: netfilter: xt_qtaguid: fix a race condition in if_tag_stat_update

Miss a lock protection in if_tag_stat_update while doing get_iface_entry. So if
one CPU is doing iface_stat_create while another CPU is doing if_tag_stat_update,
race will happened.

Change-Id: Ib8d98e542f4e385685499f5b7bb7354f08654a75
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>

ANDROID: netfilter: xt_qtaguid: Fix panic caused by synack processing

In upstream commit ca6fb06518836ef9b65dc0aac02ff97704d52a05
(tcp: attach SYNACK messages to request sockets instead of
listener)
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ca6fb0651883

The building of synack messages was changed, which made it so
the skb->sk points to a casted request_sock. This is problematic,
as there is no sk_socket in a request_sock. So when the qtaguid_mt
function tries to access the sk->sk_socket, it accesses uninitialized
memory.

After looking at how other netfilter implementations handle this,
I realized there was a skb_to_full_sk() helper added, which the
xt_qtaguid code isn't yet using.

This patch adds its use, and resovles panics seen when accessing
uninitialzed memory when processing synack packets.

Reported-by: YongQin Liu <yongquin.liu@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>

ANDROID: netfilter: xt_qtaguid: Fix panic caused by processing non-full socket.

In an issue very similar to 4e461c777e3 (xt_qtaguid: Fix panic
caused by synack processing), we were seeing panics on occasion
in testing.

In this case, it was the same issue, but caused by a different
call path, as the sk being returned from qtaguid_find_sk() was
not a full socket. Resulting in the sk->sk_socket deref to fail.

This patch adds an extra check to ensure the sk being retuned
is a full socket, and if not it returns NULL.

Reported-by: Milosz Wasilewski <milosz.wasilewski@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>

ANDROID: netfilter: xt_qtaguid: Don't show empty tag stats for unprivileged uids

BUG: 27577101
BUG: 27532522
Change-Id: Ibee3c5d224f139b9312a40acb203e87aa7060797
Signed-off-by: Mohamad Ayyash <mkayyash@google.com>

ANDROID: netfilter: xt_qtaguid: fix the deadlock when enable DDEBUG

When DDEBUG is enabled, the prdebug_full_state() function will try to
recursively aquire the spinlock of sock_tag_list and causing deadlock. A
check statement is added before it aquire the spinlock to differentiate
the behavior depend on the caller of the function.

Bug: 36559739
Test: Compile and run test under system/extra/test/iptables/
Change-Id: Ie3397fbaa207e14fe214d47aaf5e8ca1f4a712ee
Signed-off-by: Chenbo Feng <fengc@google.com>
(cherry picked from commit f0faedd6b468777f3bb5834f97100794d562c8b7)

ANDROID: netfilter: xt_qtaguid: don't check if embedded arrays are NULL

clang warns about four NULL pointer checks:

net/netfilter/xt_qtaguid.c:973:11: warning: address of array 'ifa->ifa_label' will always evaluate to 'true' [-Wpointer-bool-conversion]
net/netfilter/xt_qtaguid.c:974:13: warning: address of array 'ifa->ifa_label' will always evaluate to 'true' [-Wpointer-bool-conversion]
net/netfilter/xt_qtaguid.c:1212:31: warning: address of array 'el_dev->name' will always evaluate to 'true' [-Wpointer-bool-conversion]
net/netfilter/xt_qtaguid.c:1640:31: warning: address of array 'el_dev->name' will always evaluate to 'true' [-Wpointer-bool-conversion]

Both of these fields are embedded char[16] arrays rather than pointers,
so they can never be NULL.

Change-Id: I748ff6dd11569e5596a9d5cecdf9c334847e7307
Signed-off-by: Greg Hackmann <ghackmann@google.com>

ANDROID: netfilter: xt_qtaguid: Add untag hacks to inet_release function

To prevent protential risk of memory leak caused by closing socket with
out untag it from qtaguid module, the qtaguid module now do not hold any
socket file reference count. Instead, it will increase the sk_refcnt of
the sk struct to prevent a reuse of the socket pointer. And when a socket
is released. It will delete the tag if the socket is previously tagged so
no more resources is held by xt_qtaguid moudle. A flag is added to the untag
process to prevent possible kernel crash caused by fail to delete
corresponding socket_tag_entry list.
Bug: 36374484
Test: compile and run test under system/extra/test/iptables,

run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest

Signed-off-by: Chenbo Feng <fengc@google.com>
Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52

ANDROID: netfilter: xt_qtaguid: handle properly request sockets

To match rules related to uid/gid for syn recv packets
we need to get the full socket from request_sock struct.

Bug: 63917742
Change-Id: I03acb2251319fd800d0e36a6dde30fc1fbb7d1b0
Signed-off-by: Simon Dubray <simonx.dubray@intel.com>

ANDROID: netfilter: xt_qtaguid: fix handling for cases where tunnels are used.

* fix skb->dev vs par->in/out
When there is some forwarding going on, it introduces extra state
around devs associated with xt_action_param->in/out and sk_buff->dev.
E.g.

par->in and par->out are both set, or
skb->dev and par->out are both set (and different)

This would lead qtaguid to make the wrong assumption about the
direction and update the wrong device stats.
Now we rely more on par->in/out.

* Fix handling when qtaguid is used as "owner"
When qtaguid is used as an owner module, and sk_socket->file is
not there (happens when tunnels are involved), it would
incorrectly do a tag stats update.

* Correct debug messages.

Bug: 11687690
Change-Id: I2b1ff8bd7131969ce9e25f8291d83a6280b3ba7f
CRs-Fixed: 747810
Signed-off-by: JP Abgrall <jpa@google.com>
Git-commit: 2b71479d6f5fe8f33b335f713380f72037244395
Git-repo: https://www.codeaurora.org/cgit/quic/la/kernel/mediatek
[imaund@codeaurora.org: Resolved trivial context conflicts.]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
[bflowers@codeaurora.org: Resolved merge conflicts]
Signed-off-by: Bryse Flowers <bflowers@codeaurora.org>
Signed-off-by: Chenbo Feng <fengc@google.com>

ANDROID: netfilter: xt_qtaguid: Use sk_uid to replace uid get from socket file

Retrieve socket uid from the sk_uid field added to struct sk instead of
read it from sk->socket->file. It prevent the packet been dropped when
the socket file doesn't exist.

Bug: 37524657
Signed-off-by: Chenbo Feng <fengc@google.com>
Change-Id: Ic58239c1f9aa7e0eb1d4d1c09d40b845fd4e8e57

ANDROID: netfilter: xt_qtaguid: Fix 4.14 compilation

struct xt_action_param was changed:

in, out, family and hooknum were moved to

struct nf_hook_state *state

in, out, pf and hook

Replace atomic_read() with refcount_read()

Change-Id: If463bf84db08fe382baa825ca7818cab2150b60d
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>

ANDROID: qtaguid: Fix the UAF probelm with tag_ref_tree

When multiple threads is trying to tag/delete the same socket at the
same time, there is a chance the tag_ref_entry of the target socket to
be null before the uid_tag_data entry is freed. It is caused by the
ctrl_cmd_tag function where it doesn't correctly grab the spinlocks
when tagging a socket.

Signed-off-by: Chenbo Feng <fengc@google.com>
Bug: 65853158
Change-Id: I5d89885918054cf835370a52bff2d693362ac5f0

ANDROID: xt_qtaguid: Remove unnecessary null checks to device's name

'name' will never be NULL since it isn't a plain pointer but an array
of char values.

../net/netfilter/xt_qtaguid.c:1195:27: warning: address of array
'(*el_dev)->name' will always evaluate to 'true'
[-Wpointer-bool-conversion]

if (unlikely(!(*el_dev)->name)) {
~

Change-Id: If3b25f17829b43e8a639193fb9cd04ae45947200
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
(cherry picked from android-4.4 commit 207b579e3db6fd0cb6fe40ba3e929635ad748d89)
Signed-off-by: Chenbo Feng <fengc@google.com>

Change Summary

Incremental Difference

--- a/include/linux/android_aid.h
+++ b/include/linux/android_aid.h
@@ -22,5 +22,7 @@
2222 #define AID_INET KGIDT_INIT(3003)
2323 #define AID_NET_RAW KGIDT_INIT(3004)
2424 #define AID_NET_ADMIN KGIDT_INIT(3005)
25+#define AID_NET_BW_STATS KGIDT_INIT(3006) /* read bandwidth statistics */
26+#define AID_NET_BW_ACCT KGIDT_INIT(3007) /* change bandwidth statistics accounting */
2527
2628 #endif
--- /dev/null
+++ b/include/linux/netfilter/xt_qtaguid.h
@@ -0,0 +1,14 @@
1+#ifndef _XT_QTAGUID_MATCH_H
2+#define _XT_QTAGUID_MATCH_H
3+
4+/* For now we just replace the xt_owner.
5+ * FIXME: make iptables aware of qtaguid. */
6+#include <linux/netfilter/xt_owner.h>
7+
8+#define XT_QTAGUID_UID XT_OWNER_UID
9+#define XT_QTAGUID_GID XT_OWNER_GID
10+#define XT_QTAGUID_SOCKET XT_OWNER_SOCKET
11+#define xt_qtaguid_match_info xt_owner_match_info
12+
13+int qtaguid_untag(struct socket *sock, bool kernel);
14+#endif /* _XT_QTAGUID_MATCH_H */
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -89,6 +89,7 @@
8989 #include <linux/netfilter_ipv4.h>
9090 #include <linux/random.h>
9191 #include <linux/slab.h>
92+#include <linux/netfilter/xt_qtaguid.h>
9293
9394 #include <linux/uaccess.h>
9495
@@ -426,6 +427,9 @@ int inet_release(struct socket *sock)
426427 if (sk) {
427428 long timeout;
428429
430+#ifdef CONFIG_NETFILTER_XT_MATCH_QTAGUID
431+ qtaguid_untag(sock, true);
432+#endif
429433 /* Applications forget to leave groups before exiting */
430434 ip_mc_drop_socket(sk);
431435
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1418,6 +1418,8 @@ config NETFILTER_XT_MATCH_OWNER
14181418 based on who created the socket: the user or group. It is also
14191419 possible to check whether a socket actually exists.
14201420
1421+ Conflicts with '"quota, tag, uid" match'
1422+
14211423 config NETFILTER_XT_MATCH_POLICY
14221424 tristate 'IPsec "policy" match support'
14231425 depends on XFRM
@@ -1451,6 +1453,22 @@ config NETFILTER_XT_MATCH_PKTTYPE
14511453
14521454 To compile it as a module, choose M here. If unsure, say N.
14531455
1456+config NETFILTER_XT_MATCH_QTAGUID
1457+ bool '"quota, tag, owner" match and stats support'
1458+ depends on NETFILTER_XT_MATCH_SOCKET
1459+ depends on NETFILTER_XT_MATCH_OWNER=n
1460+ help
1461+ This option replaces the `owner' match. In addition to matching
1462+ on uid, it keeps stats based on a tag assigned to a socket.
1463+ The full tag is comprised of a UID and an accounting tag.
1464+ The tags are assignable to sockets from user space (e.g. a download
1465+ manager can assign the socket to another UID for accounting).
1466+ Stats and control are done via /proc/net/xt_qtaguid/.
1467+ It replaces owner as it takes the same arguments, but should
1468+ really be recognized by the iptables tool.
1469+
1470+ If unsure, say `N'.
1471+
14541472 config NETFILTER_XT_MATCH_QUOTA
14551473 tristate '"quota" match support'
14561474 depends on NETFILTER_ADVANCED
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -190,6 +190,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o
190190 obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
191191 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
192192 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
193+obj-$(CONFIG_NETFILTER_XT_MATCH_QTAGUID) += xt_qtaguid_print.o xt_qtaguid.o
193194 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
194195 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA2) += xt_quota2.o
195196 obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o
--- /dev/null
+++ b/net/netfilter/xt_qtaguid.c
@@ -0,0 +1,3027 @@
1+/*
2+ * Kernel iptables module to track stats for packets based on user tags.
3+ *
4+ * (C) 2011 Google, Inc
5+ *
6+ * This program is free software; you can redistribute it and/or modify
7+ * it under the terms of the GNU General Public License version 2 as
8+ * published by the Free Software Foundation.
9+ */
10+
11+/*
12+ * There are run-time debug flags enabled via the debug_mask module param, or
13+ * via the DEFAULT_DEBUG_MASK. See xt_qtaguid_internal.h.
14+ */
15+#define DEBUG
16+
17+#include <linux/file.h>
18+#include <linux/inetdevice.h>
19+#include <linux/module.h>
20+#include <linux/miscdevice.h>
21+#include <linux/netfilter/x_tables.h>
22+#include <linux/netfilter/xt_qtaguid.h>
23+#include <linux/ratelimit.h>
24+#include <linux/seq_file.h>
25+#include <linux/skbuff.h>
26+#include <linux/workqueue.h>
27+#include <net/addrconf.h>
28+#include <net/sock.h>
29+#include <net/tcp.h>
30+#include <net/udp.h>
31+#include <net/netfilter/nf_socket.h>
32+
33+#if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
34+#include <linux/netfilter_ipv6/ip6_tables.h>
35+#endif
36+
37+#include <linux/netfilter/xt_socket.h>
38+#include "xt_qtaguid_internal.h"
39+#include "xt_qtaguid_print.h"
40+#include "../../fs/proc/internal.h"
41+
42+/*
43+ * We only use the xt_socket funcs within a similar context to avoid unexpected
44+ * return values.
45+ */
46+#define XT_SOCKET_SUPPORTED_HOOKS \
47+ ((1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN))
48+
49+
50+static const char *module_procdirname = "xt_qtaguid";
51+static struct proc_dir_entry *xt_qtaguid_procdir;
52+
53+static unsigned int proc_iface_perms = S_IRUGO;
54+module_param_named(iface_perms, proc_iface_perms, uint, S_IRUGO | S_IWUSR);
55+
56+static struct proc_dir_entry *xt_qtaguid_stats_file;
57+static unsigned int proc_stats_perms = S_IRUGO;
58+module_param_named(stats_perms, proc_stats_perms, uint, S_IRUGO | S_IWUSR);
59+
60+static struct proc_dir_entry *xt_qtaguid_ctrl_file;
61+
62+/* Everybody can write. But proc_ctrl_write_limited is true by default which
63+ * limits what can be controlled. See the can_*() functions.
64+ */
65+static unsigned int proc_ctrl_perms = S_IRUGO | S_IWUGO;
66+module_param_named(ctrl_perms, proc_ctrl_perms, uint, S_IRUGO | S_IWUSR);
67+
68+/* Limited by default, so the gid of the ctrl and stats proc entries
69+ * will limit what can be done. See the can_*() functions.
70+ */
71+static bool proc_stats_readall_limited = true;
72+static bool proc_ctrl_write_limited = true;
73+
74+module_param_named(stats_readall_limited, proc_stats_readall_limited, bool,
75+ S_IRUGO | S_IWUSR);
76+module_param_named(ctrl_write_limited, proc_ctrl_write_limited, bool,
77+ S_IRUGO | S_IWUSR);
78+
79+/*
80+ * Limit the number of active tags (via socket tags) for a given UID.
81+ * Multiple processes could share the UID.
82+ */
83+static int max_sock_tags = DEFAULT_MAX_SOCK_TAGS;
84+module_param(max_sock_tags, int, S_IRUGO | S_IWUSR);
85+
86+/*
87+ * After the kernel has initiallized this module, it is still possible
88+ * to make it passive.
89+ * Setting passive to Y:
90+ * - the iface stats handling will not act on notifications.
91+ * - iptables matches will never match.
92+ * - ctrl commands silently succeed.
93+ * - stats are always empty.
94+ * This is mostly usefull when a bug is suspected.
95+ */
96+static bool module_passive;
97+module_param_named(passive, module_passive, bool, S_IRUGO | S_IWUSR);
98+
99+/*
100+ * Control how qtaguid data is tracked per proc/uid.
101+ * Setting tag_tracking_passive to Y:
102+ * - don't create proc specific structs to track tags
103+ * - don't check that active tag stats exceed some limits.
104+ * - don't clean up socket tags on process exits.
105+ * This is mostly usefull when a bug is suspected.
106+ */
107+static bool qtu_proc_handling_passive;
108+module_param_named(tag_tracking_passive, qtu_proc_handling_passive, bool,
109+ S_IRUGO | S_IWUSR);
110+
111+#define QTU_DEV_NAME "xt_qtaguid"
112+
113+uint qtaguid_debug_mask = DEFAULT_DEBUG_MASK;
114+module_param_named(debug_mask, qtaguid_debug_mask, uint, S_IRUGO | S_IWUSR);
115+
116+/*---------------------------------------------------------------------------*/
117+static const char *iface_stat_procdirname = "iface_stat";
118+static struct proc_dir_entry *iface_stat_procdir;
119+/*
120+ * The iface_stat_all* will go away once userspace gets use to the new fields
121+ * that have a format line.
122+ */
123+static const char *iface_stat_all_procfilename = "iface_stat_all";
124+static struct proc_dir_entry *iface_stat_all_procfile;
125+static const char *iface_stat_fmt_procfilename = "iface_stat_fmt";
126+static struct proc_dir_entry *iface_stat_fmt_procfile;
127+
128+
129+static LIST_HEAD(iface_stat_list);
130+static DEFINE_SPINLOCK(iface_stat_list_lock);
131+
132+static struct rb_root sock_tag_tree = RB_ROOT;
133+static DEFINE_SPINLOCK(sock_tag_list_lock);
134+
135+static struct rb_root tag_counter_set_tree = RB_ROOT;
136+static DEFINE_SPINLOCK(tag_counter_set_list_lock);
137+
138+static struct rb_root uid_tag_data_tree = RB_ROOT;
139+static DEFINE_SPINLOCK(uid_tag_data_tree_lock);
140+
141+static struct rb_root proc_qtu_data_tree = RB_ROOT;
142+/* No proc_qtu_data_tree_lock; use uid_tag_data_tree_lock */
143+
144+static struct qtaguid_event_counts qtu_events;
145+/*----------------------------------------------*/
146+static bool can_manipulate_uids(void)
147+{
148+ /* root pwnd */
149+ return in_egroup_p(xt_qtaguid_ctrl_file->gid)
150+ || unlikely(!from_kuid(&init_user_ns, current_fsuid())) || unlikely(!proc_ctrl_write_limited)
151+ || unlikely(uid_eq(current_fsuid(), xt_qtaguid_ctrl_file->uid));
152+}
153+
154+static bool can_impersonate_uid(kuid_t uid)
155+{
156+ return uid_eq(uid, current_fsuid()) || can_manipulate_uids();
157+}
158+
159+static bool can_read_other_uid_stats(kuid_t uid)
160+{
161+ /* root pwnd */
162+ return in_egroup_p(xt_qtaguid_stats_file->gid)
163+ || unlikely(!from_kuid(&init_user_ns, current_fsuid())) || uid_eq(uid, current_fsuid())
164+ || unlikely(!proc_stats_readall_limited)
165+ || unlikely(uid_eq(current_fsuid(), xt_qtaguid_ctrl_file->uid));
166+}
167+
168+static inline void dc_add_byte_packets(struct data_counters *counters, int set,
169+ enum ifs_tx_rx direction,
170+ enum ifs_proto ifs_proto,
171+ int bytes,
172+ int packets)
173+{
174+ counters->bpc[set][direction][ifs_proto].bytes += bytes;
175+ counters->bpc[set][direction][ifs_proto].packets += packets;
176+}
177+
178+static struct tag_node *tag_node_tree_search(struct rb_root *root, tag_t tag)
179+{
180+ struct rb_node *node = root->rb_node;
181+
182+ while (node) {
183+ struct tag_node *data = rb_entry(node, struct tag_node, node);
184+ int result;
185+ RB_DEBUG("qtaguid: tag_node_tree_search(0x%llx): "
186+ " node=%p data=%p\n", tag, node, data);
187+ result = tag_compare(tag, data->tag);
188+ RB_DEBUG("qtaguid: tag_node_tree_search(0x%llx): "
189+ " data.tag=0x%llx (uid=%u) res=%d\n",
190+ tag, data->tag, get_uid_from_tag(data->tag), result);
191+ if (result < 0)
192+ node = node->rb_left;
193+ else if (result > 0)
194+ node = node->rb_right;
195+ else
196+ return data;
197+ }
198+ return NULL;
199+}
200+
201+static void tag_node_tree_insert(struct tag_node *data, struct rb_root *root)
202+{
203+ struct rb_node **new = &(root->rb_node), *parent = NULL;
204+
205+ /* Figure out where to put new node */
206+ while (*new) {
207+ struct tag_node *this = rb_entry(*new, struct tag_node,
208+ node);
209+ int result = tag_compare(data->tag, this->tag);
210+ RB_DEBUG("qtaguid: %s(): tag=0x%llx"
211+ " (uid=%u)\n", __func__,
212+ this->tag,
213+ get_uid_from_tag(this->tag));
214+ parent = *new;
215+ if (result < 0)
216+ new = &((*new)->rb_left);
217+ else if (result > 0)
218+ new = &((*new)->rb_right);
219+ else
220+ BUG();
221+ }
222+
223+ /* Add new node and rebalance tree. */
224+ rb_link_node(&data->node, parent, new);
225+ rb_insert_color(&data->node, root);
226+}
227+
228+static void tag_stat_tree_insert(struct tag_stat *data, struct rb_root *root)
229+{
230+ tag_node_tree_insert(&data->tn, root);
231+}
232+
233+static struct tag_stat *tag_stat_tree_search(struct rb_root *root, tag_t tag)
234+{
235+ struct tag_node *node = tag_node_tree_search(root, tag);
236+ if (!node)
237+ return NULL;
238+ return rb_entry(&node->node, struct tag_stat, tn.node);
239+}
240+
241+static void tag_counter_set_tree_insert(struct tag_counter_set *data,
242+ struct rb_root *root)
243+{
244+ tag_node_tree_insert(&data->tn, root);
245+}
246+
247+static struct tag_counter_set *tag_counter_set_tree_search(struct rb_root *root,
248+ tag_t tag)
249+{
250+ struct tag_node *node = tag_node_tree_search(root, tag);
251+ if (!node)
252+ return NULL;
253+ return rb_entry(&node->node, struct tag_counter_set, tn.node);
254+
255+}
256+
257+static void tag_ref_tree_insert(struct tag_ref *data, struct rb_root *root)
258+{
259+ tag_node_tree_insert(&data->tn, root);
260+}
261+
262+static struct tag_ref *tag_ref_tree_search(struct rb_root *root, tag_t tag)
263+{
264+ struct tag_node *node = tag_node_tree_search(root, tag);
265+ if (!node)
266+ return NULL;
267+ return rb_entry(&node->node, struct tag_ref, tn.node);
268+}
269+
270+static struct sock_tag *sock_tag_tree_search(struct rb_root *root,
271+ const struct sock *sk)
272+{
273+ struct rb_node *node = root->rb_node;
274+
275+ while (node) {
276+ struct sock_tag *data = rb_entry(node, struct sock_tag,
277+ sock_node);
278+ if (sk < data->sk)
279+ node = node->rb_left;
280+ else if (sk > data->sk)
281+ node = node->rb_right;
282+ else
283+ return data;
284+ }
285+ return NULL;
286+}
287+
288+static void sock_tag_tree_insert(struct sock_tag *data, struct rb_root *root)
289+{
290+ struct rb_node **new = &(root->rb_node), *parent = NULL;
291+
292+ /* Figure out where to put new node */
293+ while (*new) {
294+ struct sock_tag *this = rb_entry(*new, struct sock_tag,
295+ sock_node);
296+ parent = *new;
297+ if (data->sk < this->sk)
298+ new = &((*new)->rb_left);
299+ else if (data->sk > this->sk)
300+ new = &((*new)->rb_right);
301+ else
302+ BUG();
303+ }
304+
305+ /* Add new node and rebalance tree. */
306+ rb_link_node(&data->sock_node, parent, new);
307+ rb_insert_color(&data->sock_node, root);
308+}
309+
310+static void sock_tag_tree_erase(struct rb_root *st_to_free_tree)
311+{
312+ struct rb_node *node;
313+ struct sock_tag *st_entry;
314+
315+ node = rb_first(st_to_free_tree);
316+ while (node) {
317+ st_entry = rb_entry(node, struct sock_tag, sock_node);
318+ node = rb_next(node);
319+ CT_DEBUG("qtaguid: %s(): "
320+ "erase st: sk=%p tag=0x%llx (uid=%u)\n", __func__,
321+ st_entry->sk,
322+ st_entry->tag,
323+ get_uid_from_tag(st_entry->tag));
324+ rb_erase(&st_entry->sock_node, st_to_free_tree);
325+ sock_put(st_entry->sk);
326+ kfree(st_entry);
327+ }
328+}
329+
330+static struct proc_qtu_data *proc_qtu_data_tree_search(struct rb_root *root,
331+ const pid_t pid)
332+{
333+ struct rb_node *node = root->rb_node;
334+
335+ while (node) {
336+ struct proc_qtu_data *data = rb_entry(node,
337+ struct proc_qtu_data,
338+ node);
339+ if (pid < data->pid)
340+ node = node->rb_left;
341+ else if (pid > data->pid)
342+ node = node->rb_right;
343+ else
344+ return data;
345+ }
346+ return NULL;
347+}
348+
349+static void proc_qtu_data_tree_insert(struct proc_qtu_data *data,
350+ struct rb_root *root)
351+{
352+ struct rb_node **new = &(root->rb_node), *parent = NULL;
353+
354+ /* Figure out where to put new node */
355+ while (*new) {
356+ struct proc_qtu_data *this = rb_entry(*new,
357+ struct proc_qtu_data,
358+ node);
359+ parent = *new;
360+ if (data->pid < this->pid)
361+ new = &((*new)->rb_left);
362+ else if (data->pid > this->pid)
363+ new = &((*new)->rb_right);
364+ else
365+ BUG();
366+ }
367+
368+ /* Add new node and rebalance tree. */
369+ rb_link_node(&data->node, parent, new);
370+ rb_insert_color(&data->node, root);
371+}
372+
373+static void uid_tag_data_tree_insert(struct uid_tag_data *data,
374+ struct rb_root *root)
375+{
376+ struct rb_node **new = &(root->rb_node), *parent = NULL;
377+
378+ /* Figure out where to put new node */
379+ while (*new) {
380+ struct uid_tag_data *this = rb_entry(*new,
381+ struct uid_tag_data,
382+ node);
383+ parent = *new;
384+ if (data->uid < this->uid)
385+ new = &((*new)->rb_left);
386+ else if (data->uid > this->uid)
387+ new = &((*new)->rb_right);
388+ else
389+ BUG();
390+ }
391+
392+ /* Add new node and rebalance tree. */
393+ rb_link_node(&data->node, parent, new);
394+ rb_insert_color(&data->node, root);
395+}
396+
397+static struct uid_tag_data *uid_tag_data_tree_search(struct rb_root *root,
398+ uid_t uid)
399+{
400+ struct rb_node *node = root->rb_node;
401+
402+ while (node) {
403+ struct uid_tag_data *data = rb_entry(node,
404+ struct uid_tag_data,
405+ node);
406+ if (uid < data->uid)
407+ node = node->rb_left;
408+ else if (uid > data->uid)
409+ node = node->rb_right;
410+ else
411+ return data;
412+ }
413+ return NULL;
414+}
415+
416+/*
417+ * Allocates a new uid_tag_data struct if needed.
418+ * Returns a pointer to the found or allocated uid_tag_data.
419+ * Returns a PTR_ERR on failures, and lock is not held.
420+ * If found is not NULL:
421+ * sets *found to true if not allocated.
422+ * sets *found to false if allocated.
423+ */
424+struct uid_tag_data *get_uid_data(uid_t uid, bool *found_res)
425+{
426+ struct uid_tag_data *utd_entry;
427+
428+ /* Look for top level uid_tag_data for the UID */
429+ utd_entry = uid_tag_data_tree_search(&uid_tag_data_tree, uid);
430+ DR_DEBUG("qtaguid: get_uid_data(%u) utd=%p\n", uid, utd_entry);
431+
432+ if (found_res)
433+ *found_res = utd_entry;
434+ if (utd_entry)
435+ return utd_entry;
436+
437+ utd_entry = kzalloc(sizeof(*utd_entry), GFP_ATOMIC);
438+ if (!utd_entry) {
439+ pr_err("qtaguid: get_uid_data(%u): "
440+ "tag data alloc failed\n", uid);
441+ return ERR_PTR(-ENOMEM);
442+ }
443+
444+ utd_entry->uid = uid;
445+ utd_entry->tag_ref_tree = RB_ROOT;
446+ uid_tag_data_tree_insert(utd_entry, &uid_tag_data_tree);
447+ DR_DEBUG("qtaguid: get_uid_data(%u) new utd=%p\n", uid, utd_entry);
448+ return utd_entry;
449+}
450+
451+/* Never returns NULL. Either PTR_ERR or a valid ptr. */
452+static struct tag_ref *new_tag_ref(tag_t new_tag,
453+ struct uid_tag_data *utd_entry)
454+{
455+ struct tag_ref *tr_entry;
456+ int res;
457+
458+ if (utd_entry->num_active_tags + 1 > max_sock_tags) {
459+ pr_info("qtaguid: new_tag_ref(0x%llx): "
460+ "tag ref alloc quota exceeded. max=%d\n",
461+ new_tag, max_sock_tags);
462+ res = -EMFILE;
463+ goto err_res;
464+
465+ }
466+
467+ tr_entry = kzalloc(sizeof(*tr_entry), GFP_ATOMIC);
468+ if (!tr_entry) {
469+ pr_err("qtaguid: new_tag_ref(0x%llx): "
470+ "tag ref alloc failed\n",
471+ new_tag);
472+ res = -ENOMEM;
473+ goto err_res;
474+ }
475+ tr_entry->tn.tag = new_tag;
476+ /* tr_entry->num_sock_tags handled by caller */
477+ utd_entry->num_active_tags++;
478+ tag_ref_tree_insert(tr_entry, &utd_entry->tag_ref_tree);
479+ DR_DEBUG("qtaguid: new_tag_ref(0x%llx): "
480+ " inserted new tag ref %p\n",
481+ new_tag, tr_entry);
482+ return tr_entry;
483+
484+err_res:
485+ return ERR_PTR(res);
486+}
487+
488+static struct tag_ref *lookup_tag_ref(tag_t full_tag,
489+ struct uid_tag_data **utd_res)
490+{
491+ struct uid_tag_data *utd_entry;
492+ struct tag_ref *tr_entry;
493+ bool found_utd;
494+ uid_t uid = get_uid_from_tag(full_tag);
495+
496+ DR_DEBUG("qtaguid: lookup_tag_ref(tag=0x%llx (uid=%u))\n",
497+ full_tag, uid);
498+
499+ utd_entry = get_uid_data(uid, &found_utd);
500+ if (IS_ERR_OR_NULL(utd_entry)) {
501+ if (utd_res)
502+ *utd_res = utd_entry;
503+ return NULL;
504+ }
505+
506+ tr_entry = tag_ref_tree_search(&utd_entry->tag_ref_tree, full_tag);
507+ if (utd_res)
508+ *utd_res = utd_entry;
509+ DR_DEBUG("qtaguid: lookup_tag_ref(0x%llx) utd_entry=%p tr_entry=%p\n",
510+ full_tag, utd_entry, tr_entry);
511+ return tr_entry;
512+}
513+
514+/* Never returns NULL. Either PTR_ERR or a valid ptr. */
515+static struct tag_ref *get_tag_ref(tag_t full_tag,
516+ struct uid_tag_data **utd_res)
517+{
518+ struct uid_tag_data *utd_entry;
519+ struct tag_ref *tr_entry;
520+
521+ DR_DEBUG("qtaguid: get_tag_ref(0x%llx)\n",
522+ full_tag);
523+ tr_entry = lookup_tag_ref(full_tag, &utd_entry);
524+ BUG_ON(IS_ERR_OR_NULL(utd_entry));
525+ if (!tr_entry)
526+ tr_entry = new_tag_ref(full_tag, utd_entry);
527+
528+ if (utd_res)
529+ *utd_res = utd_entry;
530+ DR_DEBUG("qtaguid: get_tag_ref(0x%llx) utd=%p tr=%p\n",
531+ full_tag, utd_entry, tr_entry);
532+ return tr_entry;
533+}
534+
535+/* Checks and maybe frees the UID Tag Data entry */
536+static void put_utd_entry(struct uid_tag_data *utd_entry)
537+{
538+ /* Are we done with the UID tag data entry? */
539+ if (RB_EMPTY_ROOT(&utd_entry->tag_ref_tree) &&
540+ !utd_entry->num_pqd) {
541+ DR_DEBUG("qtaguid: %s(): "
542+ "erase utd_entry=%p uid=%u "
543+ "by pid=%u tgid=%u uid=%u\n", __func__,
544+ utd_entry, utd_entry->uid,
545+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
546+ BUG_ON(utd_entry->num_active_tags);
547+ rb_erase(&utd_entry->node, &uid_tag_data_tree);
548+ kfree(utd_entry);
549+ } else {
550+ DR_DEBUG("qtaguid: %s(): "
551+ "utd_entry=%p still has %d tags %d proc_qtu_data\n",
552+ __func__, utd_entry, utd_entry->num_active_tags,
553+ utd_entry->num_pqd);
554+ BUG_ON(!(utd_entry->num_active_tags ||
555+ utd_entry->num_pqd));
556+ }
557+}
558+
559+/*
560+ * If no sock_tags are using this tag_ref,
561+ * decrements refcount of utd_entry, removes tr_entry
562+ * from utd_entry->tag_ref_tree and frees.
563+ */
564+static void free_tag_ref_from_utd_entry(struct tag_ref *tr_entry,
565+ struct uid_tag_data *utd_entry)
566+{
567+ DR_DEBUG("qtaguid: %s(): %p tag=0x%llx (uid=%u)\n", __func__,
568+ tr_entry, tr_entry->tn.tag,
569+ get_uid_from_tag(tr_entry->tn.tag));
570+ if (!tr_entry->num_sock_tags) {
571+ BUG_ON(!utd_entry->num_active_tags);
572+ utd_entry->num_active_tags--;
573+ rb_erase(&tr_entry->tn.node, &utd_entry->tag_ref_tree);
574+ DR_DEBUG("qtaguid: %s(): erased %p\n", __func__, tr_entry);
575+ kfree(tr_entry);
576+ }
577+}
578+
579+static void put_tag_ref_tree(tag_t full_tag, struct uid_tag_data *utd_entry)
580+{
581+ struct rb_node *node;
582+ struct tag_ref *tr_entry;
583+ tag_t acct_tag;
584+
585+ DR_DEBUG("qtaguid: %s(tag=0x%llx (uid=%u))\n", __func__,
586+ full_tag, get_uid_from_tag(full_tag));
587+ acct_tag = get_atag_from_tag(full_tag);
588+ node = rb_first(&utd_entry->tag_ref_tree);
589+ while (node) {
590+ tr_entry = rb_entry(node, struct tag_ref, tn.node);
591+ node = rb_next(node);
592+ if (!acct_tag || tr_entry->tn.tag == full_tag)
593+ free_tag_ref_from_utd_entry(tr_entry, utd_entry);
594+ }
595+}
596+
597+static ssize_t read_proc_u64(struct file *file, char __user *buf,
598+ size_t size, loff_t *ppos)
599+{
600+ uint64_t *valuep = PDE_DATA(file_inode(file));
601+ char tmp[24];
602+ size_t tmp_size;
603+
604+ tmp_size = scnprintf(tmp, sizeof(tmp), "%llu\n", *valuep);
605+ return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
606+}
607+
608+static ssize_t read_proc_bool(struct file *file, char __user *buf,
609+ size_t size, loff_t *ppos)
610+{
611+ bool *valuep = PDE_DATA(file_inode(file));
612+ char tmp[24];
613+ size_t tmp_size;
614+
615+ tmp_size = scnprintf(tmp, sizeof(tmp), "%u\n", *valuep);
616+ return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
617+}
618+
619+static int get_active_counter_set(tag_t tag)
620+{
621+ int active_set = 0;
622+ struct tag_counter_set *tcs;
623+
624+ MT_DEBUG("qtaguid: get_active_counter_set(tag=0x%llx)"
625+ " (uid=%u)\n",
626+ tag, get_uid_from_tag(tag));
627+ /* For now we only handle UID tags for active sets */
628+ tag = get_utag_from_tag(tag);
629+ spin_lock_bh(&tag_counter_set_list_lock);
630+ tcs = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
631+ if (tcs)
632+ active_set = tcs->active_set;
633+ spin_unlock_bh(&tag_counter_set_list_lock);
634+ return active_set;
635+}
636+
637+/*
638+ * Find the entry for tracking the specified interface.
639+ * Caller must hold iface_stat_list_lock
640+ */
641+static struct iface_stat *get_iface_entry(const char *ifname)
642+{
643+ struct iface_stat *iface_entry;
644+
645+ /* Find the entry for tracking the specified tag within the interface */
646+ if (ifname == NULL) {
647+ pr_info("qtaguid: iface_stat: get() NULL device name\n");
648+ return NULL;
649+ }
650+
651+ /* Iterate over interfaces */
652+ list_for_each_entry(iface_entry, &iface_stat_list, list) {
653+ if (!strcmp(ifname, iface_entry->ifname))
654+ goto done;
655+ }
656+ iface_entry = NULL;
657+done:
658+ return iface_entry;
659+}
660+
661+/* This is for fmt2 only */
662+static void pp_iface_stat_header(struct seq_file *m)
663+{
664+ seq_puts(m,
665+ "ifname "
666+ "total_skb_rx_bytes total_skb_rx_packets "
667+ "total_skb_tx_bytes total_skb_tx_packets "
668+ "rx_tcp_bytes rx_tcp_packets "
669+ "rx_udp_bytes rx_udp_packets "
670+ "rx_other_bytes rx_other_packets "
671+ "tx_tcp_bytes tx_tcp_packets "
672+ "tx_udp_bytes tx_udp_packets "
673+ "tx_other_bytes tx_other_packets\n"
674+ );
675+}
676+
677+static void pp_iface_stat_line(struct seq_file *m,
678+ struct iface_stat *iface_entry)
679+{
680+ struct data_counters *cnts;
681+ int cnt_set = 0; /* We only use one set for the device */
682+ cnts = &iface_entry->totals_via_skb;
683+ seq_printf(m, "%s %llu %llu %llu %llu %llu %llu %llu %llu "
684+ "%llu %llu %llu %llu %llu %llu %llu %llu\n",
685+ iface_entry->ifname,
686+ dc_sum_bytes(cnts, cnt_set, IFS_RX),
687+ dc_sum_packets(cnts, cnt_set, IFS_RX),
688+ dc_sum_bytes(cnts, cnt_set, IFS_TX),
689+ dc_sum_packets(cnts, cnt_set, IFS_TX),
690+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].bytes,
691+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].packets,
692+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].bytes,
693+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].packets,
694+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].bytes,
695+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].packets,
696+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].bytes,
697+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].packets,
698+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].bytes,
699+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].packets,
700+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].bytes,
701+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].packets);
702+}
703+
704+struct proc_iface_stat_fmt_info {
705+ int fmt;
706+};
707+
708+static void *iface_stat_fmt_proc_start(struct seq_file *m, loff_t *pos)
709+{
710+ struct proc_iface_stat_fmt_info *p = m->private;
711+ loff_t n = *pos;
712+
713+ /*
714+ * This lock will prevent iface_stat_update() from changing active,
715+ * and in turn prevent an interface from unregistering itself.
716+ */
717+ spin_lock_bh(&iface_stat_list_lock);
718+
719+ if (unlikely(module_passive))
720+ return NULL;
721+
722+ if (!n && p->fmt == 2)
723+ pp_iface_stat_header(m);
724+
725+ return seq_list_start(&iface_stat_list, n);
726+}
727+
728+static void *iface_stat_fmt_proc_next(struct seq_file *m, void *p, loff_t *pos)
729+{
730+ return seq_list_next(p, &iface_stat_list, pos);
731+}
732+
733+static void iface_stat_fmt_proc_stop(struct seq_file *m, void *p)
734+{
735+ spin_unlock_bh(&iface_stat_list_lock);
736+}
737+
738+static int iface_stat_fmt_proc_show(struct seq_file *m, void *v)
739+{
740+ struct proc_iface_stat_fmt_info *p = m->private;
741+ struct iface_stat *iface_entry;
742+ struct rtnl_link_stats64 dev_stats, *stats;
743+ struct rtnl_link_stats64 no_dev_stats = {0};
744+
745+
746+ CT_DEBUG("qtaguid:proc iface_stat_fmt pid=%u tgid=%u uid=%u\n",
747+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
748+
749+ iface_entry = list_entry(v, struct iface_stat, list);
750+
751+ if (iface_entry->active) {
752+ stats = dev_get_stats(iface_entry->net_dev,
753+ &dev_stats);
754+ } else {
755+ stats = &no_dev_stats;
756+ }
757+ /*
758+ * If the meaning of the data changes, then update the fmtX
759+ * string.
760+ */
761+ if (p->fmt == 1) {
762+ seq_printf(m, "%s %d %llu %llu %llu %llu %llu %llu %llu %llu\n",
763+ iface_entry->ifname,
764+ iface_entry->active,
765+ iface_entry->totals_via_dev[IFS_RX].bytes,
766+ iface_entry->totals_via_dev[IFS_RX].packets,
767+ iface_entry->totals_via_dev[IFS_TX].bytes,
768+ iface_entry->totals_via_dev[IFS_TX].packets,
769+ stats->rx_bytes, stats->rx_packets,
770+ stats->tx_bytes, stats->tx_packets
771+ );
772+ } else {
773+ pp_iface_stat_line(m, iface_entry);
774+ }
775+ return 0;
776+}
777+
778+static const struct file_operations read_u64_fops = {
779+ .read = read_proc_u64,
780+ .llseek = default_llseek,
781+};
782+
783+static const struct file_operations read_bool_fops = {
784+ .read = read_proc_bool,
785+ .llseek = default_llseek,
786+};
787+
788+static void iface_create_proc_worker(struct work_struct *work)
789+{
790+ struct proc_dir_entry *proc_entry;
791+ struct iface_stat_work *isw = container_of(work, struct iface_stat_work,
792+ iface_work);
793+ struct iface_stat *new_iface = isw->iface_entry;
794+
795+ /* iface_entries are not deleted, so safe to manipulate. */
796+ proc_entry = proc_mkdir(new_iface->ifname, iface_stat_procdir);
797+ if (IS_ERR_OR_NULL(proc_entry)) {
798+ pr_err("qtaguid: iface_stat: create_proc(): alloc failed.\n");
799+ kfree(isw);
800+ return;
801+ }
802+
803+ new_iface->proc_ptr = proc_entry;
804+
805+ proc_create_data("tx_bytes", proc_iface_perms, proc_entry,
806+ &read_u64_fops,
807+ &new_iface->totals_via_dev[IFS_TX].bytes);
808+ proc_create_data("rx_bytes", proc_iface_perms, proc_entry,
809+ &read_u64_fops,
810+ &new_iface->totals_via_dev[IFS_RX].bytes);
811+ proc_create_data("tx_packets", proc_iface_perms, proc_entry,
812+ &read_u64_fops,
813+ &new_iface->totals_via_dev[IFS_TX].packets);
814+ proc_create_data("rx_packets", proc_iface_perms, proc_entry,
815+ &read_u64_fops,
816+ &new_iface->totals_via_dev[IFS_RX].packets);
817+ proc_create_data("active", proc_iface_perms, proc_entry,
818+ &read_bool_fops, &new_iface->active);
819+
820+ IF_DEBUG("qtaguid: iface_stat: create_proc(): done "
821+ "entry=%p dev=%s\n", new_iface, new_iface->ifname);
822+ kfree(isw);
823+}
824+
825+/*
826+ * Will set the entry's active state, and
827+ * update the net_dev accordingly also.
828+ */
829+static void _iface_stat_set_active(struct iface_stat *entry,
830+ struct net_device *net_dev,
831+ bool activate)
832+{
833+ if (activate) {
834+ entry->net_dev = net_dev;
835+ entry->active = true;
836+ IF_DEBUG("qtaguid: %s(%s): "
837+ "enable tracking. rfcnt=%d\n", __func__,
838+ entry->ifname,
839+ __this_cpu_read(*net_dev->pcpu_refcnt));
840+ } else {
841+ entry->active = false;
842+ entry->net_dev = NULL;
843+ IF_DEBUG("qtaguid: %s(%s): "
844+ "disable tracking. rfcnt=%d\n", __func__,
845+ entry->ifname,
846+ __this_cpu_read(*net_dev->pcpu_refcnt));
847+
848+ }
849+}
850+
851+/* Caller must hold iface_stat_list_lock */
852+static struct iface_stat *iface_alloc(struct net_device *net_dev)
853+{
854+ struct iface_stat *new_iface;
855+ struct iface_stat_work *isw;
856+
857+ new_iface = kzalloc(sizeof(*new_iface), GFP_ATOMIC);
858+ if (new_iface == NULL) {
859+ pr_err("qtaguid: iface_stat: create(%s): "
860+ "iface_stat alloc failed\n", net_dev->name);
861+ return NULL;
862+ }
863+ new_iface->ifname = kstrdup(net_dev->name, GFP_ATOMIC);
864+ if (new_iface->ifname == NULL) {
865+ pr_err("qtaguid: iface_stat: create(%s): "
866+ "ifname alloc failed\n", net_dev->name);
867+ kfree(new_iface);
868+ return NULL;
869+ }
870+ spin_lock_init(&new_iface->tag_stat_list_lock);
871+ new_iface->tag_stat_tree = RB_ROOT;
872+ _iface_stat_set_active(new_iface, net_dev, true);
873+
874+ /*
875+ * ipv6 notifier chains are atomic :(
876+ * No create_proc_read_entry() for you!
877+ */
878+ isw = kmalloc(sizeof(*isw), GFP_ATOMIC);
879+ if (!isw) {
880+ pr_err("qtaguid: iface_stat: create(%s): "
881+ "work alloc failed\n", new_iface->ifname);
882+ _iface_stat_set_active(new_iface, net_dev, false);
883+ kfree(new_iface->ifname);
884+ kfree(new_iface);
885+ return NULL;
886+ }
887+ isw->iface_entry = new_iface;
888+ INIT_WORK(&isw->iface_work, iface_create_proc_worker);
889+ schedule_work(&isw->iface_work);
890+ list_add(&new_iface->list, &iface_stat_list);
891+ return new_iface;
892+}
893+
894+static void iface_check_stats_reset_and_adjust(struct net_device *net_dev,
895+ struct iface_stat *iface)
896+{
897+ struct rtnl_link_stats64 dev_stats, *stats;
898+ bool stats_rewound;
899+
900+ stats = dev_get_stats(net_dev, &dev_stats);
901+ /* No empty packets */
902+ stats_rewound =
903+ (stats->rx_bytes < iface->last_known[IFS_RX].bytes)
904+ || (stats->tx_bytes < iface->last_known[IFS_TX].bytes);
905+
906+ IF_DEBUG("qtaguid: %s(%s): iface=%p netdev=%p "
907+ "bytes rx/tx=%llu/%llu "
908+ "active=%d last_known=%d "
909+ "stats_rewound=%d\n", __func__,
910+ net_dev ? net_dev->name : "?",
911+ iface, net_dev,
912+ stats->rx_bytes, stats->tx_bytes,
913+ iface->active, iface->last_known_valid, stats_rewound);
914+
915+ if (iface->active && iface->last_known_valid && stats_rewound) {
916+ pr_warn_once("qtaguid: iface_stat: %s(%s): "
917+ "iface reset its stats unexpectedly\n", __func__,
918+ net_dev->name);
919+
920+ iface->totals_via_dev[IFS_TX].bytes +=
921+ iface->last_known[IFS_TX].bytes;
922+ iface->totals_via_dev[IFS_TX].packets +=
923+ iface->last_known[IFS_TX].packets;
924+ iface->totals_via_dev[IFS_RX].bytes +=
925+ iface->last_known[IFS_RX].bytes;
926+ iface->totals_via_dev[IFS_RX].packets +=
927+ iface->last_known[IFS_RX].packets;
928+ iface->last_known_valid = false;
929+ IF_DEBUG("qtaguid: %s(%s): iface=%p "
930+ "used last known bytes rx/tx=%llu/%llu\n", __func__,
931+ iface->ifname, iface, iface->last_known[IFS_RX].bytes,
932+ iface->last_known[IFS_TX].bytes);
933+ }
934+}
935+
936+/*
937+ * Create a new entry for tracking the specified interface.
938+ * Do nothing if the entry already exists.
939+ * Called when an interface is configured with a valid IP address.
940+ */
941+static void iface_stat_create(struct net_device *net_dev,
942+ struct in_ifaddr *ifa)
943+{
944+ struct in_device *in_dev = NULL;
945+ const char *ifname;
946+ struct iface_stat *entry;
947+ __be32 ipaddr = 0;
948+ struct iface_stat *new_iface;
949+
950+ IF_DEBUG("qtaguid: iface_stat: create(%s): ifa=%p netdev=%p\n",
951+ net_dev ? net_dev->name : "?",
952+ ifa, net_dev);
953+ if (!net_dev) {
954+ pr_err("qtaguid: iface_stat: create(): no net dev\n");
955+ return;
956+ }
957+
958+ ifname = net_dev->name;
959+ if (!ifa) {
960+ in_dev = in_dev_get(net_dev);
961+ if (!in_dev) {
962+ pr_err("qtaguid: iface_stat: create(%s): no inet dev\n",
963+ ifname);
964+ return;
965+ }
966+ IF_DEBUG("qtaguid: iface_stat: create(%s): in_dev=%p\n",
967+ ifname, in_dev);
968+ for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) {
969+ IF_DEBUG("qtaguid: iface_stat: create(%s): "
970+ "ifa=%p ifa_label=%s\n",
971+ ifname, ifa, ifa->ifa_label);
972+ if (!strcmp(ifname, ifa->ifa_label))
973+ break;
974+ }
975+ }
976+
977+ if (!ifa) {
978+ IF_DEBUG("qtaguid: iface_stat: create(%s): no matching IP\n",
979+ ifname);
980+ goto done_put;
981+ }
982+ ipaddr = ifa->ifa_local;
983+
984+ spin_lock_bh(&iface_stat_list_lock);
985+ entry = get_iface_entry(ifname);
986+ if (entry != NULL) {
987+ IF_DEBUG("qtaguid: iface_stat: create(%s): entry=%p\n",
988+ ifname, entry);
989+ iface_check_stats_reset_and_adjust(net_dev, entry);
990+ _iface_stat_set_active(entry, net_dev, true);
991+ IF_DEBUG("qtaguid: %s(%s): "
992+ "tracking now %d on ip=%pI4\n", __func__,
993+ entry->ifname, true, &ipaddr);
994+ goto done_unlock_put;
995+ }
996+
997+ new_iface = iface_alloc(net_dev);
998+ IF_DEBUG("qtaguid: iface_stat: create(%s): done "
999+ "entry=%p ip=%pI4\n", ifname, new_iface, &ipaddr);
1000+done_unlock_put:
1001+ spin_unlock_bh(&iface_stat_list_lock);
1002+done_put:
1003+ if (in_dev)
1004+ in_dev_put(in_dev);
1005+}
1006+
1007+static void iface_stat_create_ipv6(struct net_device *net_dev,
1008+ struct inet6_ifaddr *ifa)
1009+{
1010+ struct in_device *in_dev;
1011+ const char *ifname;
1012+ struct iface_stat *entry;
1013+ struct iface_stat *new_iface;
1014+ int addr_type;
1015+
1016+ IF_DEBUG("qtaguid: iface_stat: create6(): ifa=%p netdev=%p->name=%s\n",
1017+ ifa, net_dev, net_dev ? net_dev->name : "");
1018+ if (!net_dev) {
1019+ pr_err("qtaguid: iface_stat: create6(): no net dev!\n");
1020+ return;
1021+ }
1022+ ifname = net_dev->name;
1023+
1024+ in_dev = in_dev_get(net_dev);
1025+ if (!in_dev) {
1026+ pr_err("qtaguid: iface_stat: create6(%s): no inet dev\n",
1027+ ifname);
1028+ return;
1029+ }
1030+
1031+ IF_DEBUG("qtaguid: iface_stat: create6(%s): in_dev=%p\n",
1032+ ifname, in_dev);
1033+
1034+ if (!ifa) {
1035+ IF_DEBUG("qtaguid: iface_stat: create6(%s): no matching IP\n",
1036+ ifname);
1037+ goto done_put;
1038+ }
1039+ addr_type = ipv6_addr_type(&ifa->addr);
1040+
1041+ spin_lock_bh(&iface_stat_list_lock);
1042+ entry = get_iface_entry(ifname);
1043+ if (entry != NULL) {
1044+ IF_DEBUG("qtaguid: %s(%s): entry=%p\n", __func__,
1045+ ifname, entry);
1046+ iface_check_stats_reset_and_adjust(net_dev, entry);
1047+ _iface_stat_set_active(entry, net_dev, true);
1048+ IF_DEBUG("qtaguid: %s(%s): "
1049+ "tracking now %d on ip=%pI6c\n", __func__,
1050+ entry->ifname, true, &ifa->addr);
1051+ goto done_unlock_put;
1052+ }
1053+
1054+ new_iface = iface_alloc(net_dev);
1055+ IF_DEBUG("qtaguid: iface_stat: create6(%s): done "
1056+ "entry=%p ip=%pI6c\n", ifname, new_iface, &ifa->addr);
1057+
1058+done_unlock_put:
1059+ spin_unlock_bh(&iface_stat_list_lock);
1060+done_put:
1061+ in_dev_put(in_dev);
1062+}
1063+
1064+static struct sock_tag *get_sock_stat_nl(const struct sock *sk)
1065+{
1066+ MT_DEBUG("qtaguid: get_sock_stat_nl(sk=%p)\n", sk);
1067+ return sock_tag_tree_search(&sock_tag_tree, sk);
1068+}
1069+
1070+static struct sock_tag *get_sock_stat(const struct sock *sk)
1071+{
1072+ struct sock_tag *sock_tag_entry;
1073+ MT_DEBUG("qtaguid: get_sock_stat(sk=%p)\n", sk);
1074+ if (!sk)
1075+ return NULL;
1076+ spin_lock_bh(&sock_tag_list_lock);
1077+ sock_tag_entry = get_sock_stat_nl(sk);
1078+ spin_unlock_bh(&sock_tag_list_lock);
1079+ return sock_tag_entry;
1080+}
1081+
1082+static int ipx_proto(const struct sk_buff *skb,
1083+ struct xt_action_param *par)
1084+{
1085+ int thoff = 0, tproto;
1086+
1087+ switch (par->state->pf) {
1088+ case NFPROTO_IPV6:
1089+ tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
1090+ if (tproto < 0)
1091+ MT_DEBUG("%s(): transport header not found in ipv6"
1092+ " skb=%p\n", __func__, skb);
1093+ break;
1094+ case NFPROTO_IPV4:
1095+ tproto = ip_hdr(skb)->protocol;
1096+ break;
1097+ default:
1098+ tproto = IPPROTO_RAW;
1099+ }
1100+ return tproto;
1101+}
1102+
1103+static void
1104+data_counters_update(struct data_counters *dc, int set,
1105+ enum ifs_tx_rx direction, int proto, int bytes)
1106+{
1107+ switch (proto) {
1108+ case IPPROTO_TCP:
1109+ dc_add_byte_packets(dc, set, direction, IFS_TCP, bytes, 1);
1110+ break;
1111+ case IPPROTO_UDP:
1112+ dc_add_byte_packets(dc, set, direction, IFS_UDP, bytes, 1);
1113+ break;
1114+ case IPPROTO_IP:
1115+ default:
1116+ dc_add_byte_packets(dc, set, direction, IFS_PROTO_OTHER, bytes,
1117+ 1);
1118+ break;
1119+ }
1120+}
1121+
1122+/*
1123+ * Update stats for the specified interface. Do nothing if the entry
1124+ * does not exist (when a device was never configured with an IP address).
1125+ * Called when an device is being unregistered.
1126+ */
1127+static void iface_stat_update(struct net_device *net_dev, bool stash_only)
1128+{
1129+ struct rtnl_link_stats64 dev_stats, *stats;
1130+ struct iface_stat *entry;
1131+
1132+ stats = dev_get_stats(net_dev, &dev_stats);
1133+ spin_lock_bh(&iface_stat_list_lock);
1134+ entry = get_iface_entry(net_dev->name);
1135+ if (entry == NULL) {
1136+ IF_DEBUG("qtaguid: iface_stat: update(%s): not tracked\n",
1137+ net_dev->name);
1138+ spin_unlock_bh(&iface_stat_list_lock);
1139+ return;
1140+ }
1141+
1142+ IF_DEBUG("qtaguid: %s(%s): entry=%p\n", __func__,
1143+ net_dev->name, entry);
1144+ if (!entry->active) {
1145+ IF_DEBUG("qtaguid: %s(%s): already disabled\n", __func__,
1146+ net_dev->name);
1147+ spin_unlock_bh(&iface_stat_list_lock);
1148+ return;
1149+ }
1150+
1151+ if (stash_only) {
1152+ entry->last_known[IFS_TX].bytes = stats->tx_bytes;
1153+ entry->last_known[IFS_TX].packets = stats->tx_packets;
1154+ entry->last_known[IFS_RX].bytes = stats->rx_bytes;
1155+ entry->last_known[IFS_RX].packets = stats->rx_packets;
1156+ entry->last_known_valid = true;
1157+ IF_DEBUG("qtaguid: %s(%s): "
1158+ "dev stats stashed rx/tx=%llu/%llu\n", __func__,
1159+ net_dev->name, stats->rx_bytes, stats->tx_bytes);
1160+ spin_unlock_bh(&iface_stat_list_lock);
1161+ return;
1162+ }
1163+ entry->totals_via_dev[IFS_TX].bytes += stats->tx_bytes;
1164+ entry->totals_via_dev[IFS_TX].packets += stats->tx_packets;
1165+ entry->totals_via_dev[IFS_RX].bytes += stats->rx_bytes;
1166+ entry->totals_via_dev[IFS_RX].packets += stats->rx_packets;
1167+ /* We don't need the last_known[] anymore */
1168+ entry->last_known_valid = false;
1169+ _iface_stat_set_active(entry, net_dev, false);
1170+ IF_DEBUG("qtaguid: %s(%s): "
1171+ "disable tracking. rx/tx=%llu/%llu\n", __func__,
1172+ net_dev->name, stats->rx_bytes, stats->tx_bytes);
1173+ spin_unlock_bh(&iface_stat_list_lock);
1174+}
1175+
1176+/* Guarantied to return a net_device that has a name */
1177+static void get_dev_and_dir(const struct sk_buff *skb,
1178+ struct xt_action_param *par,
1179+ enum ifs_tx_rx *direction,
1180+ const struct net_device **el_dev)
1181+{
1182+ const struct nf_hook_state *parst = par->state;
1183+
1184+ BUG_ON(!direction || !el_dev);
1185+
1186+ if (parst->in) {
1187+ *el_dev = parst->in;
1188+ *direction = IFS_RX;
1189+ } else if (parst->out) {
1190+ *el_dev = parst->out;
1191+ *direction = IFS_TX;
1192+ } else {
1193+ pr_err("qtaguid[%d]: %s(): no par->state->in/out?!!\n",
1194+ parst->hook, __func__);
1195+ BUG();
1196+ }
1197+ if (skb->dev && *el_dev != skb->dev) {
1198+ MT_DEBUG("qtaguid[%d]: skb->dev=%p %s vs par->%s=%p %s\n",
1199+ parst->hook, skb->dev, skb->dev->name,
1200+ *direction == IFS_RX ? "in" : "out", *el_dev,
1201+ (*el_dev)->name);
1202+ }
1203+}
1204+
1205+/*
1206+ * Update stats for the specified interface from the skb.
1207+ * Do nothing if the entry
1208+ * does not exist (when a device was never configured with an IP address).
1209+ * Called on each sk.
1210+ */
1211+static void iface_stat_update_from_skb(const struct sk_buff *skb,
1212+ struct xt_action_param *par)
1213+{
1214+ const struct nf_hook_state *parst = par->state;
1215+ struct iface_stat *entry;
1216+ const struct net_device *el_dev;
1217+ enum ifs_tx_rx direction;
1218+ int bytes = skb->len;
1219+ int proto;
1220+
1221+ get_dev_and_dir(skb, par, &direction, &el_dev);
1222+ proto = ipx_proto(skb, par);
1223+ MT_DEBUG("qtaguid[%d]: iface_stat: %s(%s): "
1224+ "type=%d fam=%d proto=%d dir=%d\n",
1225+ parst->hook, __func__, el_dev->name, el_dev->type,
1226+ parst->pf, proto, direction);
1227+
1228+ spin_lock_bh(&iface_stat_list_lock);
1229+ entry = get_iface_entry(el_dev->name);
1230+ if (entry == NULL) {
1231+ IF_DEBUG("qtaguid[%d]: iface_stat: %s(%s): not tracked\n",
1232+ parst->hook, __func__, el_dev->name);
1233+ spin_unlock_bh(&iface_stat_list_lock);
1234+ return;
1235+ }
1236+
1237+ IF_DEBUG("qtaguid[%d]: %s(%s): entry=%p\n", parst->hook, __func__,
1238+ el_dev->name, entry);
1239+
1240+ data_counters_update(&entry->totals_via_skb, 0, direction, proto,
1241+ bytes);
1242+ spin_unlock_bh(&iface_stat_list_lock);
1243+}
1244+
1245+static void tag_stat_update(struct tag_stat *tag_entry,
1246+ enum ifs_tx_rx direction, int proto, int bytes)
1247+{
1248+ int active_set;
1249+ active_set = get_active_counter_set(tag_entry->tn.tag);
1250+ MT_DEBUG("qtaguid: tag_stat_update(tag=0x%llx (uid=%u) set=%d "
1251+ "dir=%d proto=%d bytes=%d)\n",
1252+ tag_entry->tn.tag, get_uid_from_tag(tag_entry->tn.tag),
1253+ active_set, direction, proto, bytes);
1254+ data_counters_update(&tag_entry->counters, active_set, direction,
1255+ proto, bytes);
1256+ if (tag_entry->parent_counters)
1257+ data_counters_update(tag_entry->parent_counters, active_set,
1258+ direction, proto, bytes);
1259+}
1260+
1261+/*
1262+ * Create a new entry for tracking the specified {acct_tag,uid_tag} within
1263+ * the interface.
1264+ * iface_entry->tag_stat_list_lock should be held.
1265+ */
1266+static struct tag_stat *create_if_tag_stat(struct iface_stat *iface_entry,
1267+ tag_t tag)
1268+{
1269+ struct tag_stat *new_tag_stat_entry = NULL;
1270+ IF_DEBUG("qtaguid: iface_stat: %s(): ife=%p tag=0x%llx"
1271+ " (uid=%u)\n", __func__,
1272+ iface_entry, tag, get_uid_from_tag(tag));
1273+ new_tag_stat_entry = kzalloc(sizeof(*new_tag_stat_entry), GFP_ATOMIC);
1274+ if (!new_tag_stat_entry) {
1275+ pr_err("qtaguid: iface_stat: tag stat alloc failed\n");
1276+ goto done;
1277+ }
1278+ new_tag_stat_entry->tn.tag = tag;
1279+ tag_stat_tree_insert(new_tag_stat_entry, &iface_entry->tag_stat_tree);
1280+done:
1281+ return new_tag_stat_entry;
1282+}
1283+
1284+static void if_tag_stat_update(const char *ifname, uid_t uid,
1285+ const struct sock *sk, enum ifs_tx_rx direction,
1286+ int proto, int bytes)
1287+{
1288+ struct tag_stat *tag_stat_entry;
1289+ tag_t tag, acct_tag;
1290+ tag_t uid_tag;
1291+ struct data_counters *uid_tag_counters;
1292+ struct sock_tag *sock_tag_entry;
1293+ struct iface_stat *iface_entry;
1294+ struct tag_stat *new_tag_stat = NULL;
1295+ MT_DEBUG("qtaguid: if_tag_stat_update(ifname=%s "
1296+ "uid=%u sk=%p dir=%d proto=%d bytes=%d)\n",
1297+ ifname, uid, sk, direction, proto, bytes);
1298+
1299+ spin_lock_bh(&iface_stat_list_lock);
1300+ iface_entry = get_iface_entry(ifname);
1301+ if (!iface_entry) {
1302+ pr_err_ratelimited("qtaguid: tag_stat: stat_update() "
1303+ "%s not found\n", ifname);
1304+ spin_unlock_bh(&iface_stat_list_lock);
1305+ return;
1306+ }
1307+ /* It is ok to process data when an iface_entry is inactive */
1308+
1309+ MT_DEBUG("qtaguid: tag_stat: stat_update() dev=%s entry=%p\n",
1310+ ifname, iface_entry);
1311+
1312+ /*
1313+ * Look for a tagged sock.
1314+ * It will have an acct_uid.
1315+ */
1316+ sock_tag_entry = get_sock_stat(sk);
1317+ if (sock_tag_entry) {
1318+ tag = sock_tag_entry->tag;
1319+ acct_tag = get_atag_from_tag(tag);
1320+ uid_tag = get_utag_from_tag(tag);
1321+ } else {
1322+ acct_tag = make_atag_from_value(0);
1323+ tag = combine_atag_with_uid(acct_tag, uid);
1324+ uid_tag = make_tag_from_uid(uid);
1325+ }
1326+ MT_DEBUG("qtaguid: tag_stat: stat_update(): "
1327+ " looking for tag=0x%llx (uid=%u) in ife=%p\n",
1328+ tag, get_uid_from_tag(tag), iface_entry);
1329+ /* Loop over tag list under this interface for {acct_tag,uid_tag} */
1330+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
1331+
1332+ tag_stat_entry = tag_stat_tree_search(&iface_entry->tag_stat_tree,
1333+ tag);
1334+ if (tag_stat_entry) {
1335+ /*
1336+ * Updating the {acct_tag, uid_tag} entry handles both stats:
1337+ * {0, uid_tag} will also get updated.
1338+ */
1339+ tag_stat_update(tag_stat_entry, direction, proto, bytes);
1340+ goto unlock;
1341+ }
1342+
1343+ /* Loop over tag list under this interface for {0,uid_tag} */
1344+ tag_stat_entry = tag_stat_tree_search(&iface_entry->tag_stat_tree,
1345+ uid_tag);
1346+ if (!tag_stat_entry) {
1347+ /* Here: the base uid_tag did not exist */
1348+ /*
1349+ * No parent counters. So
1350+ * - No {0, uid_tag} stats and no {acc_tag, uid_tag} stats.
1351+ */
1352+ new_tag_stat = create_if_tag_stat(iface_entry, uid_tag);
1353+ if (!new_tag_stat)
1354+ goto unlock;
1355+ uid_tag_counters = &new_tag_stat->counters;
1356+ } else {
1357+ uid_tag_counters = &tag_stat_entry->counters;
1358+ }
1359+
1360+ if (acct_tag) {
1361+ /* Create the child {acct_tag, uid_tag} and hook up parent. */
1362+ new_tag_stat = create_if_tag_stat(iface_entry, tag);
1363+ if (!new_tag_stat)
1364+ goto unlock;
1365+ new_tag_stat->parent_counters = uid_tag_counters;
1366+ } else {
1367+ /*
1368+ * For new_tag_stat to be still NULL here would require:
1369+ * {0, uid_tag} exists
1370+ * and {acct_tag, uid_tag} doesn't exist
1371+ * AND acct_tag == 0.
1372+ * Impossible. This reassures us that new_tag_stat
1373+ * below will always be assigned.
1374+ */
1375+ BUG_ON(!new_tag_stat);
1376+ }
1377+ tag_stat_update(new_tag_stat, direction, proto, bytes);
1378+unlock:
1379+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
1380+ spin_unlock_bh(&iface_stat_list_lock);
1381+}
1382+
1383+static int iface_netdev_event_handler(struct notifier_block *nb,
1384+ unsigned long event, void *ptr) {
1385+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
1386+
1387+ if (unlikely(module_passive))
1388+ return NOTIFY_DONE;
1389+
1390+ IF_DEBUG("qtaguid: iface_stat: netdev_event(): "
1391+ "ev=0x%lx/%s netdev=%p->name=%s\n",
1392+ event, netdev_evt_str(event), dev, dev ? dev->name : "");
1393+
1394+ switch (event) {
1395+ case NETDEV_UP:
1396+ iface_stat_create(dev, NULL);
1397+ atomic64_inc(&qtu_events.iface_events);
1398+ break;
1399+ case NETDEV_DOWN:
1400+ case NETDEV_UNREGISTER:
1401+ iface_stat_update(dev, event == NETDEV_DOWN);
1402+ atomic64_inc(&qtu_events.iface_events);
1403+ break;
1404+ }
1405+ return NOTIFY_DONE;
1406+}
1407+
1408+static int iface_inet6addr_event_handler(struct notifier_block *nb,
1409+ unsigned long event, void *ptr)
1410+{
1411+ struct inet6_ifaddr *ifa = ptr;
1412+ struct net_device *dev;
1413+
1414+ if (unlikely(module_passive))
1415+ return NOTIFY_DONE;
1416+
1417+ IF_DEBUG("qtaguid: iface_stat: inet6addr_event(): "
1418+ "ev=0x%lx/%s ifa=%p\n",
1419+ event, netdev_evt_str(event), ifa);
1420+
1421+ switch (event) {
1422+ case NETDEV_UP:
1423+ BUG_ON(!ifa || !ifa->idev);
1424+ dev = (struct net_device *)ifa->idev->dev;
1425+ iface_stat_create_ipv6(dev, ifa);
1426+ atomic64_inc(&qtu_events.iface_events);
1427+ break;
1428+ case NETDEV_DOWN:
1429+ case NETDEV_UNREGISTER:
1430+ BUG_ON(!ifa || !ifa->idev);
1431+ dev = (struct net_device *)ifa->idev->dev;
1432+ iface_stat_update(dev, event == NETDEV_DOWN);
1433+ atomic64_inc(&qtu_events.iface_events);
1434+ break;
1435+ }
1436+ return NOTIFY_DONE;
1437+}
1438+
1439+static int iface_inetaddr_event_handler(struct notifier_block *nb,
1440+ unsigned long event, void *ptr)
1441+{
1442+ struct in_ifaddr *ifa = ptr;
1443+ struct net_device *dev;
1444+
1445+ if (unlikely(module_passive))
1446+ return NOTIFY_DONE;
1447+
1448+ IF_DEBUG("qtaguid: iface_stat: inetaddr_event(): "
1449+ "ev=0x%lx/%s ifa=%p\n",
1450+ event, netdev_evt_str(event), ifa);
1451+
1452+ switch (event) {
1453+ case NETDEV_UP:
1454+ BUG_ON(!ifa || !ifa->ifa_dev);
1455+ dev = ifa->ifa_dev->dev;
1456+ iface_stat_create(dev, ifa);
1457+ atomic64_inc(&qtu_events.iface_events);
1458+ break;
1459+ case NETDEV_DOWN:
1460+ case NETDEV_UNREGISTER:
1461+ BUG_ON(!ifa || !ifa->ifa_dev);
1462+ dev = ifa->ifa_dev->dev;
1463+ iface_stat_update(dev, event == NETDEV_DOWN);
1464+ atomic64_inc(&qtu_events.iface_events);
1465+ break;
1466+ }
1467+ return NOTIFY_DONE;
1468+}
1469+
1470+static struct notifier_block iface_netdev_notifier_blk = {
1471+ .notifier_call = iface_netdev_event_handler,
1472+};
1473+
1474+static struct notifier_block iface_inetaddr_notifier_blk = {
1475+ .notifier_call = iface_inetaddr_event_handler,
1476+};
1477+
1478+static struct notifier_block iface_inet6addr_notifier_blk = {
1479+ .notifier_call = iface_inet6addr_event_handler,
1480+};
1481+
1482+static const struct seq_operations iface_stat_fmt_proc_seq_ops = {
1483+ .start = iface_stat_fmt_proc_start,
1484+ .next = iface_stat_fmt_proc_next,
1485+ .stop = iface_stat_fmt_proc_stop,
1486+ .show = iface_stat_fmt_proc_show,
1487+};
1488+
1489+static int proc_iface_stat_fmt_open(struct inode *inode, struct file *file)
1490+{
1491+ struct proc_iface_stat_fmt_info *s;
1492+
1493+ s = __seq_open_private(file, &iface_stat_fmt_proc_seq_ops,
1494+ sizeof(struct proc_iface_stat_fmt_info));
1495+ if (!s)
1496+ return -ENOMEM;
1497+
1498+ s->fmt = (uintptr_t)PDE_DATA(inode);
1499+ return 0;
1500+}
1501+
1502+static const struct file_operations proc_iface_stat_fmt_fops = {
1503+ .open = proc_iface_stat_fmt_open,
1504+ .read = seq_read,
1505+ .llseek = seq_lseek,
1506+ .release = seq_release_private,
1507+};
1508+
1509+static int __init iface_stat_init(struct proc_dir_entry *parent_procdir)
1510+{
1511+ int err;
1512+
1513+ iface_stat_procdir = proc_mkdir(iface_stat_procdirname, parent_procdir);
1514+ if (!iface_stat_procdir) {
1515+ pr_err("qtaguid: iface_stat: init failed to create proc entry\n");
1516+ err = -1;
1517+ goto err;
1518+ }
1519+
1520+ iface_stat_all_procfile = proc_create_data(iface_stat_all_procfilename,
1521+ proc_iface_perms,
1522+ parent_procdir,
1523+ &proc_iface_stat_fmt_fops,
1524+ (void *)1 /* fmt1 */);
1525+ if (!iface_stat_all_procfile) {
1526+ pr_err("qtaguid: iface_stat: init "
1527+ " failed to create stat_old proc entry\n");
1528+ err = -1;
1529+ goto err_zap_entry;
1530+ }
1531+
1532+ iface_stat_fmt_procfile = proc_create_data(iface_stat_fmt_procfilename,
1533+ proc_iface_perms,
1534+ parent_procdir,
1535+ &proc_iface_stat_fmt_fops,
1536+ (void *)2 /* fmt2 */);
1537+ if (!iface_stat_fmt_procfile) {
1538+ pr_err("qtaguid: iface_stat: init "
1539+ " failed to create stat_all proc entry\n");
1540+ err = -1;
1541+ goto err_zap_all_stats_entry;
1542+ }
1543+
1544+
1545+ err = register_netdevice_notifier(&iface_netdev_notifier_blk);
1546+ if (err) {
1547+ pr_err("qtaguid: iface_stat: init "
1548+ "failed to register dev event handler\n");
1549+ goto err_zap_all_stats_entries;
1550+ }
1551+ err = register_inetaddr_notifier(&iface_inetaddr_notifier_blk);
1552+ if (err) {
1553+ pr_err("qtaguid: iface_stat: init "
1554+ "failed to register ipv4 dev event handler\n");
1555+ goto err_unreg_nd;
1556+ }
1557+
1558+ err = register_inet6addr_notifier(&iface_inet6addr_notifier_blk);
1559+ if (err) {
1560+ pr_err("qtaguid: iface_stat: init "
1561+ "failed to register ipv6 dev event handler\n");
1562+ goto err_unreg_ip4_addr;
1563+ }
1564+ return 0;
1565+
1566+err_unreg_ip4_addr:
1567+ unregister_inetaddr_notifier(&iface_inetaddr_notifier_blk);
1568+err_unreg_nd:
1569+ unregister_netdevice_notifier(&iface_netdev_notifier_blk);
1570+err_zap_all_stats_entries:
1571+ remove_proc_entry(iface_stat_fmt_procfilename, parent_procdir);
1572+err_zap_all_stats_entry:
1573+ remove_proc_entry(iface_stat_all_procfilename, parent_procdir);
1574+err_zap_entry:
1575+ remove_proc_entry(iface_stat_procdirname, parent_procdir);
1576+err:
1577+ return err;
1578+}
1579+
1580+static struct sock *qtaguid_find_sk(const struct sk_buff *skb,
1581+ struct xt_action_param *par)
1582+{
1583+ const struct nf_hook_state *parst = par->state;
1584+ struct sock *sk;
1585+ unsigned int hook_mask = (1 << parst->hook);
1586+
1587+ MT_DEBUG("qtaguid[%d]: find_sk(skb=%p) family=%d\n",
1588+ parst->hook, skb, parst->pf);
1589+
1590+ /*
1591+ * Let's not abuse the the xt_socket_get*_sk(), or else it will
1592+ * return garbage SKs.
1593+ */
1594+ if (!(hook_mask & XT_SOCKET_SUPPORTED_HOOKS))
1595+ return NULL;
1596+
1597+ switch (parst->pf) {
1598+ case NFPROTO_IPV6:
1599+ sk = nf_sk_lookup_slow_v6(dev_net(skb->dev), skb, parst->in);
1600+ break;
1601+ case NFPROTO_IPV4:
1602+ sk = nf_sk_lookup_slow_v4(dev_net(skb->dev), skb, parst->in);
1603+ break;
1604+ default:
1605+ return NULL;
1606+ }
1607+
1608+ if (sk) {
1609+ MT_DEBUG("qtaguid[%d]: %p->sk_proto=%u->sk_state=%d\n",
1610+ parst->hook, sk, sk->sk_protocol, sk->sk_state);
1611+ }
1612+ return sk;
1613+}
1614+
1615+static void account_for_uid(const struct sk_buff *skb,
1616+ const struct sock *alternate_sk, uid_t uid,
1617+ struct xt_action_param *par)
1618+{
1619+ const struct net_device *el_dev;
1620+ enum ifs_tx_rx direction;
1621+ int proto;
1622+
1623+ get_dev_and_dir(skb, par, &direction, &el_dev);
1624+ proto = ipx_proto(skb, par);
1625+ MT_DEBUG("qtaguid[%d]: dev name=%s type=%d fam=%d proto=%d dir=%d\n",
1626+ par->state->hook, el_dev->name, el_dev->type,
1627+ par->state->pf, proto, direction);
1628+
1629+ if_tag_stat_update(el_dev->name, uid,
1630+ skb->sk ? skb->sk : alternate_sk,
1631+ direction,
1632+ proto, skb->len);
1633+}
1634+
1635+static bool qtaguid_mt(const struct sk_buff *skb, struct xt_action_param *par)
1636+{
1637+ const struct xt_qtaguid_match_info *info = par->matchinfo;
1638+ const struct nf_hook_state *parst = par->state;
1639+ const struct file *filp;
1640+ bool got_sock = false;
1641+ struct sock *sk;
1642+ kuid_t sock_uid;
1643+ bool res;
1644+ bool set_sk_callback_lock = false;
1645+ /*
1646+ * TODO: unhack how to force just accounting.
1647+ * For now we only do tag stats when the uid-owner is not requested
1648+ */
1649+ bool do_tag_stat = !(info->match & XT_QTAGUID_UID);
1650+
1651+ if (unlikely(module_passive))
1652+ return (info->match ^ info->invert) == 0;
1653+
1654+ MT_DEBUG("qtaguid[%d]: entered skb=%p par->in=%p/out=%p fam=%d\n",
1655+ parst->hook, skb, parst->in, parst->out, parst->pf);
1656+
1657+ atomic64_inc(&qtu_events.match_calls);
1658+ if (skb == NULL) {
1659+ res = (info->match ^ info->invert) == 0;
1660+ goto ret_res;
1661+ }
1662+
1663+ switch (parst->hook) {
1664+ case NF_INET_PRE_ROUTING:
1665+ case NF_INET_POST_ROUTING:
1666+ atomic64_inc(&qtu_events.match_calls_prepost);
1667+ iface_stat_update_from_skb(skb, par);
1668+ /*
1669+ * We are done in pre/post. The skb will get processed
1670+ * further alter.
1671+ */
1672+ res = (info->match ^ info->invert);
1673+ goto ret_res;
1674+ break;
1675+ /* default: Fall through and do UID releated work */
1676+ }
1677+
1678+ sk = skb_to_full_sk(skb);
1679+ /*
1680+ * When in TCP_TIME_WAIT the sk is not a "struct sock" but
1681+ * "struct inet_timewait_sock" which is missing fields.
1682+ * So we ignore it.
1683+ */
1684+ if (sk && sk->sk_state == TCP_TIME_WAIT)
1685+ sk = NULL;
1686+ if (sk == NULL) {
1687+ /*
1688+ * A missing sk->sk_socket happens when packets are in-flight
1689+ * and the matching socket is already closed and gone.
1690+ */
1691+ sk = qtaguid_find_sk(skb, par);
1692+ /*
1693+ * TCP_NEW_SYN_RECV are not "struct sock" but "struct request_sock"
1694+ * where we can get a pointer to a full socket to retrieve uid/gid.
1695+ * When in TCP_TIME_WAIT, sk is a struct inet_timewait_sock
1696+ * which is missing fields and does not contain any reference
1697+ * to a full socket, so just ignore the socket.
1698+ */
1699+ if (sk && sk->sk_state == TCP_NEW_SYN_RECV) {
1700+ sock_gen_put(sk);
1701+ sk = sk_to_full_sk(sk);
1702+ } else if (sk && (!sk_fullsock(sk) || sk->sk_state == TCP_TIME_WAIT)) {
1703+ sock_gen_put(sk);
1704+ sk = NULL;
1705+ } else {
1706+ /*
1707+ * If we got the socket from the find_sk(), we will need to put
1708+ * it back, as nf_tproxy_get_sock_v4() got it.
1709+ */
1710+ got_sock = sk;
1711+ }
1712+ if (sk)
1713+ atomic64_inc(&qtu_events.match_found_sk_in_ct);
1714+ else
1715+ atomic64_inc(&qtu_events.match_found_no_sk_in_ct);
1716+ } else {
1717+ atomic64_inc(&qtu_events.match_found_sk);
1718+ }
1719+ MT_DEBUG("qtaguid[%d]: sk=%p got_sock=%d fam=%d proto=%d\n",
1720+ parst->hook, sk, got_sock, parst->pf, ipx_proto(skb, par));
1721+
1722+ if (!sk) {
1723+ /*
1724+ * Here, the qtaguid_find_sk() using connection tracking
1725+ * couldn't find the owner, so for now we just count them
1726+ * against the system.
1727+ */
1728+ if (do_tag_stat)
1729+ account_for_uid(skb, sk, 0, par);
1730+ MT_DEBUG("qtaguid[%d]: leaving (sk=NULL)\n", parst->hook);
1731+ res = (info->match ^ info->invert) == 0;
1732+ atomic64_inc(&qtu_events.match_no_sk);
1733+ goto put_sock_ret_res;
1734+ } else if (info->match & info->invert & XT_QTAGUID_SOCKET) {
1735+ res = false;
1736+ goto put_sock_ret_res;
1737+ }
1738+ sock_uid = sk->sk_uid;
1739+ if (do_tag_stat)
1740+ account_for_uid(skb, sk, from_kuid(&init_user_ns, sock_uid),
1741+ par);
1742+
1743+ /*
1744+ * The following two tests fail the match when:
1745+ * id not in range AND no inverted condition requested
1746+ * or id in range AND inverted condition requested
1747+ * Thus (!a && b) || (a && !b) == a ^ b
1748+ */
1749+ if (info->match & XT_QTAGUID_UID) {
1750+ kuid_t uid_min = make_kuid(&init_user_ns, info->uid_min);
1751+ kuid_t uid_max = make_kuid(&init_user_ns, info->uid_max);
1752+
1753+ if ((uid_gte(sock_uid, uid_min) &&
1754+ uid_lte(sock_uid, uid_max)) ^
1755+ !(info->invert & XT_QTAGUID_UID)) {
1756+ MT_DEBUG("qtaguid[%d]: leaving uid not matching\n",
1757+ parst->hook);
1758+ res = false;
1759+ goto put_sock_ret_res;
1760+ }
1761+ }
1762+ if (info->match & XT_QTAGUID_GID) {
1763+ kgid_t gid_min = make_kgid(&init_user_ns, info->gid_min);
1764+ kgid_t gid_max = make_kgid(&init_user_ns, info->gid_max);
1765+ set_sk_callback_lock = true;
1766+ read_lock_bh(&sk->sk_callback_lock);
1767+ MT_DEBUG("qtaguid[%d]: sk=%p->sk_socket=%p->file=%p\n",
1768+ parst->hook, sk, sk->sk_socket,
1769+ sk->sk_socket ? sk->sk_socket->file : (void *)-1LL);
1770+ filp = sk->sk_socket ? sk->sk_socket->file : NULL;
1771+ if (!filp) {
1772+ res = ((info->match ^ info->invert) &
1773+ XT_QTAGUID_GID) == 0;
1774+ atomic64_inc(&qtu_events.match_no_sk_gid);
1775+ goto put_sock_ret_res;
1776+ }
1777+ MT_DEBUG("qtaguid[%d]: filp...uid=%u\n",
1778+ parst->hook, filp ?
1779+ from_kuid(&init_user_ns, filp->f_cred->fsuid) : -1);
1780+ if ((gid_gte(filp->f_cred->fsgid, gid_min) &&
1781+ gid_lte(filp->f_cred->fsgid, gid_max)) ^
1782+ !(info->invert & XT_QTAGUID_GID)) {
1783+ MT_DEBUG("qtaguid[%d]: leaving gid not matching\n",
1784+ parst->hook);
1785+ res = false;
1786+ goto put_sock_ret_res;
1787+ }
1788+ }
1789+ MT_DEBUG("qtaguid[%d]: leaving matched\n", parst->hook);
1790+ res = true;
1791+
1792+put_sock_ret_res:
1793+ if (got_sock)
1794+ sock_gen_put(sk);
1795+ if (set_sk_callback_lock)
1796+ read_unlock_bh(&sk->sk_callback_lock);
1797+ret_res:
1798+ MT_DEBUG("qtaguid[%d]: left %d\n", parst->hook, res);
1799+ return res;
1800+}
1801+
1802+#ifdef DDEBUG
1803+/*
1804+ * This function is not in xt_qtaguid_print.c because of locks visibility.
1805+ * The lock of sock_tag_list must be aquired before calling this function
1806+ */
1807+static void prdebug_full_state_locked(int indent_level, const char *fmt, ...)
1808+{
1809+ va_list args;
1810+ char *fmt_buff;
1811+ char *buff;
1812+
1813+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
1814+ return;
1815+
1816+ fmt_buff = kasprintf(GFP_ATOMIC,
1817+ "qtaguid: %s(): %s {\n", __func__, fmt);
1818+ BUG_ON(!fmt_buff);
1819+ va_start(args, fmt);
1820+ buff = kvasprintf(GFP_ATOMIC,
1821+ fmt_buff, args);
1822+ BUG_ON(!buff);
1823+ pr_debug("%s", buff);
1824+ kfree(fmt_buff);
1825+ kfree(buff);
1826+ va_end(args);
1827+
1828+ prdebug_sock_tag_tree(indent_level, &sock_tag_tree);
1829+
1830+ spin_lock_bh(&uid_tag_data_tree_lock);
1831+ prdebug_uid_tag_data_tree(indent_level, &uid_tag_data_tree);
1832+ prdebug_proc_qtu_data_tree(indent_level, &proc_qtu_data_tree);
1833+ spin_unlock_bh(&uid_tag_data_tree_lock);
1834+
1835+ spin_lock_bh(&iface_stat_list_lock);
1836+ prdebug_iface_stat_list(indent_level, &iface_stat_list);
1837+ spin_unlock_bh(&iface_stat_list_lock);
1838+
1839+ pr_debug("qtaguid: %s(): }\n", __func__);
1840+}
1841+#else
1842+static void prdebug_full_state_locked(int indent_level, const char *fmt, ...) {}
1843+#endif
1844+
1845+struct proc_ctrl_print_info {
1846+ struct sock *sk; /* socket found by reading to sk_pos */
1847+ loff_t sk_pos;
1848+};
1849+
1850+static void *qtaguid_ctrl_proc_next(struct seq_file *m, void *v, loff_t *pos)
1851+{
1852+ struct proc_ctrl_print_info *pcpi = m->private;
1853+ struct sock_tag *sock_tag_entry = v;
1854+ struct rb_node *node;
1855+
1856+ (*pos)++;
1857+
1858+ if (!v || v == SEQ_START_TOKEN)
1859+ return NULL;
1860+
1861+ node = rb_next(&sock_tag_entry->sock_node);
1862+ if (!node) {
1863+ pcpi->sk = NULL;
1864+ sock_tag_entry = SEQ_START_TOKEN;
1865+ } else {
1866+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
1867+ pcpi->sk = sock_tag_entry->sk;
1868+ }
1869+ pcpi->sk_pos = *pos;
1870+ return sock_tag_entry;
1871+}
1872+
1873+static void *qtaguid_ctrl_proc_start(struct seq_file *m, loff_t *pos)
1874+{
1875+ struct proc_ctrl_print_info *pcpi = m->private;
1876+ struct sock_tag *sock_tag_entry;
1877+ struct rb_node *node;
1878+
1879+ spin_lock_bh(&sock_tag_list_lock);
1880+
1881+ if (unlikely(module_passive))
1882+ return NULL;
1883+
1884+ if (*pos == 0) {
1885+ pcpi->sk_pos = 0;
1886+ node = rb_first(&sock_tag_tree);
1887+ if (!node) {
1888+ pcpi->sk = NULL;
1889+ return SEQ_START_TOKEN;
1890+ }
1891+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
1892+ pcpi->sk = sock_tag_entry->sk;
1893+ } else {
1894+ sock_tag_entry = (pcpi->sk ? get_sock_stat_nl(pcpi->sk) :
1895+ NULL) ?: SEQ_START_TOKEN;
1896+ if (*pos != pcpi->sk_pos) {
1897+ /* seq_read skipped a next call */
1898+ *pos = pcpi->sk_pos;
1899+ return qtaguid_ctrl_proc_next(m, sock_tag_entry, pos);
1900+ }
1901+ }
1902+ return sock_tag_entry;
1903+}
1904+
1905+static void qtaguid_ctrl_proc_stop(struct seq_file *m, void *v)
1906+{
1907+ spin_unlock_bh(&sock_tag_list_lock);
1908+}
1909+
1910+/*
1911+ * Procfs reader to get all active socket tags using style "1)" as described in
1912+ * fs/proc/generic.c
1913+ */
1914+static int qtaguid_ctrl_proc_show(struct seq_file *m, void *v)
1915+{
1916+ struct sock_tag *sock_tag_entry = v;
1917+ uid_t uid;
1918+
1919+ CT_DEBUG("qtaguid: proc ctrl pid=%u tgid=%u uid=%u\n",
1920+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
1921+
1922+ if (sock_tag_entry != SEQ_START_TOKEN) {
1923+ int sk_ref_count;
1924+ uid = get_uid_from_tag(sock_tag_entry->tag);
1925+ CT_DEBUG("qtaguid: proc_read(): sk=%p tag=0x%llx (uid=%u) "
1926+ "pid=%u\n",
1927+ sock_tag_entry->sk,
1928+ sock_tag_entry->tag,
1929+ uid,
1930+ sock_tag_entry->pid
1931+ );
1932+ sk_ref_count = refcount_read(
1933+ &sock_tag_entry->sk->sk_refcnt);
1934+ seq_printf(m, "sock=%pK tag=0x%llx (uid=%u) pid=%u "
1935+ "f_count=%d\n",
1936+ sock_tag_entry->sk,
1937+ sock_tag_entry->tag, uid,
1938+ sock_tag_entry->pid, sk_ref_count);
1939+ } else {
1940+ seq_printf(m, "events: sockets_tagged=%llu "
1941+ "sockets_untagged=%llu "
1942+ "counter_set_changes=%llu "
1943+ "delete_cmds=%llu "
1944+ "iface_events=%llu "
1945+ "match_calls=%llu "
1946+ "match_calls_prepost=%llu "
1947+ "match_found_sk=%llu "
1948+ "match_found_sk_in_ct=%llu "
1949+ "match_found_no_sk_in_ct=%llu "
1950+ "match_no_sk=%llu "
1951+ "match_no_sk_gid=%llu\n",
1952+ (u64)atomic64_read(&qtu_events.sockets_tagged),
1953+ (u64)atomic64_read(&qtu_events.sockets_untagged),
1954+ (u64)atomic64_read(&qtu_events.counter_set_changes),
1955+ (u64)atomic64_read(&qtu_events.delete_cmds),
1956+ (u64)atomic64_read(&qtu_events.iface_events),
1957+ (u64)atomic64_read(&qtu_events.match_calls),
1958+ (u64)atomic64_read(&qtu_events.match_calls_prepost),
1959+ (u64)atomic64_read(&qtu_events.match_found_sk),
1960+ (u64)atomic64_read(&qtu_events.match_found_sk_in_ct),
1961+ (u64)atomic64_read(&qtu_events.match_found_no_sk_in_ct),
1962+ (u64)atomic64_read(&qtu_events.match_no_sk),
1963+ (u64)atomic64_read(&qtu_events.match_no_sk_gid));
1964+
1965+ /* Count the following as part of the last item_index. No need
1966+ * to lock the sock_tag_list here since it is already locked when
1967+ * starting the seq_file operation
1968+ */
1969+ prdebug_full_state_locked(0, "proc ctrl");
1970+ }
1971+
1972+ return 0;
1973+}
1974+
1975+/*
1976+ * Delete socket tags, and stat tags associated with a given
1977+ * accouting tag and uid.
1978+ */
1979+static int ctrl_cmd_delete(const char *input)
1980+{
1981+ char cmd;
1982+ int uid_int;
1983+ kuid_t uid;
1984+ uid_t entry_uid;
1985+ tag_t acct_tag;
1986+ tag_t tag;
1987+ int res, argc;
1988+ struct iface_stat *iface_entry;
1989+ struct rb_node *node;
1990+ struct sock_tag *st_entry;
1991+ struct rb_root st_to_free_tree = RB_ROOT;
1992+ struct tag_stat *ts_entry;
1993+ struct tag_counter_set *tcs_entry;
1994+ struct tag_ref *tr_entry;
1995+ struct uid_tag_data *utd_entry;
1996+
1997+ argc = sscanf(input, "%c %llu %u", &cmd, &acct_tag, &uid_int);
1998+ uid = make_kuid(&init_user_ns, uid_int);
1999+ CT_DEBUG("qtaguid: ctrl_delete(%s): argc=%d cmd=%c "
2000+ "user_tag=0x%llx uid=%u\n", input, argc, cmd,
2001+ acct_tag, uid_int);
2002+ if (argc < 2) {
2003+ res = -EINVAL;
2004+ goto err;
2005+ }
2006+ if (!valid_atag(acct_tag)) {
2007+ pr_info("qtaguid: ctrl_delete(%s): invalid tag\n", input);
2008+ res = -EINVAL;
2009+ goto err;
2010+ }
2011+ if (argc < 3) {
2012+ uid = current_fsuid();
2013+ uid_int = from_kuid(&init_user_ns, uid);
2014+ } else if (!can_impersonate_uid(uid)) {
2015+ pr_info("qtaguid: ctrl_delete(%s): "
2016+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
2017+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2018+ res = -EPERM;
2019+ goto err;
2020+ }
2021+
2022+ tag = combine_atag_with_uid(acct_tag, uid_int);
2023+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
2024+ "looking for tag=0x%llx (uid=%u)\n",
2025+ input, tag, uid_int);
2026+
2027+ /* Delete socket tags */
2028+ spin_lock_bh(&sock_tag_list_lock);
2029+ spin_lock_bh(&uid_tag_data_tree_lock);
2030+ node = rb_first(&sock_tag_tree);
2031+ while (node) {
2032+ st_entry = rb_entry(node, struct sock_tag, sock_node);
2033+ entry_uid = get_uid_from_tag(st_entry->tag);
2034+ node = rb_next(node);
2035+ if (entry_uid != uid_int)
2036+ continue;
2037+
2038+ CT_DEBUG("qtaguid: ctrl_delete(%s): st tag=0x%llx (uid=%u)\n",
2039+ input, st_entry->tag, entry_uid);
2040+
2041+ if (!acct_tag || st_entry->tag == tag) {
2042+ rb_erase(&st_entry->sock_node, &sock_tag_tree);
2043+ /* Can't sockfd_put() within spinlock, do it later. */
2044+ sock_tag_tree_insert(st_entry, &st_to_free_tree);
2045+ tr_entry = lookup_tag_ref(st_entry->tag, NULL);
2046+ BUG_ON(tr_entry->num_sock_tags <= 0);
2047+ tr_entry->num_sock_tags--;
2048+ /*
2049+ * TODO: remove if, and start failing.
2050+ * This is a hack to work around the fact that in some
2051+ * places we have "if (IS_ERR_OR_NULL(pqd_entry))"
2052+ * and are trying to work around apps
2053+ * that didn't open the /dev/xt_qtaguid.
2054+ */
2055+ if (st_entry->list.next && st_entry->list.prev)
2056+ list_del(&st_entry->list);
2057+ }
2058+ }
2059+ spin_unlock_bh(&uid_tag_data_tree_lock);
2060+ spin_unlock_bh(&sock_tag_list_lock);
2061+
2062+ sock_tag_tree_erase(&st_to_free_tree);
2063+
2064+ /* Delete tag counter-sets */
2065+ spin_lock_bh(&tag_counter_set_list_lock);
2066+ /* Counter sets are only on the uid tag, not full tag */
2067+ tcs_entry = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
2068+ if (tcs_entry) {
2069+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
2070+ "erase tcs: tag=0x%llx (uid=%u) set=%d\n",
2071+ input,
2072+ tcs_entry->tn.tag,
2073+ get_uid_from_tag(tcs_entry->tn.tag),
2074+ tcs_entry->active_set);
2075+ rb_erase(&tcs_entry->tn.node, &tag_counter_set_tree);
2076+ kfree(tcs_entry);
2077+ }
2078+ spin_unlock_bh(&tag_counter_set_list_lock);
2079+
2080+ /*
2081+ * If acct_tag is 0, then all entries belonging to uid are
2082+ * erased.
2083+ */
2084+ spin_lock_bh(&iface_stat_list_lock);
2085+ list_for_each_entry(iface_entry, &iface_stat_list, list) {
2086+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
2087+ node = rb_first(&iface_entry->tag_stat_tree);
2088+ while (node) {
2089+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
2090+ entry_uid = get_uid_from_tag(ts_entry->tn.tag);
2091+ node = rb_next(node);
2092+
2093+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
2094+ "ts tag=0x%llx (uid=%u)\n",
2095+ input, ts_entry->tn.tag, entry_uid);
2096+
2097+ if (entry_uid != uid_int)
2098+ continue;
2099+ if (!acct_tag || ts_entry->tn.tag == tag) {
2100+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
2101+ "erase ts: %s 0x%llx %u\n",
2102+ input, iface_entry->ifname,
2103+ get_atag_from_tag(ts_entry->tn.tag),
2104+ entry_uid);
2105+ rb_erase(&ts_entry->tn.node,
2106+ &iface_entry->tag_stat_tree);
2107+ kfree(ts_entry);
2108+ }
2109+ }
2110+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
2111+ }
2112+ spin_unlock_bh(&iface_stat_list_lock);
2113+
2114+ /* Cleanup the uid_tag_data */
2115+ spin_lock_bh(&uid_tag_data_tree_lock);
2116+ node = rb_first(&uid_tag_data_tree);
2117+ while (node) {
2118+ utd_entry = rb_entry(node, struct uid_tag_data, node);
2119+ entry_uid = utd_entry->uid;
2120+ node = rb_next(node);
2121+
2122+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
2123+ "utd uid=%u\n",
2124+ input, entry_uid);
2125+
2126+ if (entry_uid != uid_int)
2127+ continue;
2128+ /*
2129+ * Go over the tag_refs, and those that don't have
2130+ * sock_tags using them are freed.
2131+ */
2132+ put_tag_ref_tree(tag, utd_entry);
2133+ put_utd_entry(utd_entry);
2134+ }
2135+ spin_unlock_bh(&uid_tag_data_tree_lock);
2136+
2137+ atomic64_inc(&qtu_events.delete_cmds);
2138+ res = 0;
2139+
2140+err:
2141+ return res;
2142+}
2143+
2144+static int ctrl_cmd_counter_set(const char *input)
2145+{
2146+ char cmd;
2147+ uid_t uid = 0;
2148+ tag_t tag;
2149+ int res, argc;
2150+ struct tag_counter_set *tcs;
2151+ int counter_set;
2152+
2153+ argc = sscanf(input, "%c %d %u", &cmd, &counter_set, &uid);
2154+ CT_DEBUG("qtaguid: ctrl_counterset(%s): argc=%d cmd=%c "
2155+ "set=%d uid=%u\n", input, argc, cmd,
2156+ counter_set, uid);
2157+ if (argc != 3) {
2158+ res = -EINVAL;
2159+ goto err;
2160+ }
2161+ if (counter_set < 0 || counter_set >= IFS_MAX_COUNTER_SETS) {
2162+ pr_info("qtaguid: ctrl_counterset(%s): invalid counter_set range\n",
2163+ input);
2164+ res = -EINVAL;
2165+ goto err;
2166+ }
2167+ if (!can_manipulate_uids()) {
2168+ pr_info("qtaguid: ctrl_counterset(%s): "
2169+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
2170+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2171+ res = -EPERM;
2172+ goto err;
2173+ }
2174+
2175+ tag = make_tag_from_uid(uid);
2176+ spin_lock_bh(&tag_counter_set_list_lock);
2177+ tcs = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
2178+ if (!tcs) {
2179+ tcs = kzalloc(sizeof(*tcs), GFP_ATOMIC);
2180+ if (!tcs) {
2181+ spin_unlock_bh(&tag_counter_set_list_lock);
2182+ pr_err("qtaguid: ctrl_counterset(%s): "
2183+ "failed to alloc counter set\n",
2184+ input);
2185+ res = -ENOMEM;
2186+ goto err;
2187+ }
2188+ tcs->tn.tag = tag;
2189+ tag_counter_set_tree_insert(tcs, &tag_counter_set_tree);
2190+ CT_DEBUG("qtaguid: ctrl_counterset(%s): added tcs tag=0x%llx "
2191+ "(uid=%u) set=%d\n",
2192+ input, tag, get_uid_from_tag(tag), counter_set);
2193+ }
2194+ tcs->active_set = counter_set;
2195+ spin_unlock_bh(&tag_counter_set_list_lock);
2196+ atomic64_inc(&qtu_events.counter_set_changes);
2197+ res = 0;
2198+
2199+err:
2200+ return res;
2201+}
2202+
2203+static int ctrl_cmd_tag(const char *input)
2204+{
2205+ char cmd;
2206+ int sock_fd = 0;
2207+ kuid_t uid;
2208+ unsigned int uid_int = 0;
2209+ tag_t acct_tag = make_atag_from_value(0);
2210+ tag_t full_tag;
2211+ struct socket *el_socket;
2212+ int res, argc;
2213+ struct sock_tag *sock_tag_entry;
2214+ struct tag_ref *tag_ref_entry;
2215+ struct uid_tag_data *uid_tag_data_entry;
2216+ struct proc_qtu_data *pqd_entry;
2217+
2218+ /* Unassigned args will get defaulted later. */
2219+ argc = sscanf(input, "%c %d %llu %u", &cmd, &sock_fd, &acct_tag, &uid_int);
2220+ uid = make_kuid(&init_user_ns, uid_int);
2221+ CT_DEBUG("qtaguid: ctrl_tag(%s): argc=%d cmd=%c sock_fd=%d "
2222+ "acct_tag=0x%llx uid=%u\n", input, argc, cmd, sock_fd,
2223+ acct_tag, uid_int);
2224+ if (argc < 2) {
2225+ res = -EINVAL;
2226+ goto err;
2227+ }
2228+ el_socket = sockfd_lookup(sock_fd, &res); /* This locks the file */
2229+ if (!el_socket) {
2230+ pr_info("qtaguid: ctrl_tag(%s): failed to lookup"
2231+ " sock_fd=%d err=%d pid=%u tgid=%u uid=%u\n",
2232+ input, sock_fd, res, current->pid, current->tgid,
2233+ from_kuid(&init_user_ns, current_fsuid()));
2234+ goto err;
2235+ }
2236+ CT_DEBUG("qtaguid: ctrl_tag(%s): socket->...->sk_refcnt=%d ->sk=%p\n",
2237+ input, refcount_read(&el_socket->sk->sk_refcnt),
2238+ el_socket->sk);
2239+ if (argc < 3) {
2240+ acct_tag = make_atag_from_value(0);
2241+ } else if (!valid_atag(acct_tag)) {
2242+ pr_info("qtaguid: ctrl_tag(%s): invalid tag\n", input);
2243+ res = -EINVAL;
2244+ goto err_put;
2245+ }
2246+ CT_DEBUG("qtaguid: ctrl_tag(%s): "
2247+ "pid=%u tgid=%u uid=%u euid=%u fsuid=%u "
2248+ "ctrl.gid=%u in_group()=%d in_egroup()=%d\n",
2249+ input, current->pid, current->tgid,
2250+ from_kuid(&init_user_ns, current_uid()),
2251+ from_kuid(&init_user_ns, current_euid()),
2252+ from_kuid(&init_user_ns, current_fsuid()),
2253+ from_kgid(&init_user_ns, xt_qtaguid_ctrl_file->gid),
2254+ in_group_p(xt_qtaguid_ctrl_file->gid),
2255+ in_egroup_p(xt_qtaguid_ctrl_file->gid));
2256+ if (argc < 4) {
2257+ uid = current_fsuid();
2258+ uid_int = from_kuid(&init_user_ns, uid);
2259+ } else if (!can_impersonate_uid(uid)) {
2260+ pr_info("qtaguid: ctrl_tag(%s): "
2261+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
2262+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2263+ res = -EPERM;
2264+ goto err_put;
2265+ }
2266+ full_tag = combine_atag_with_uid(acct_tag, uid_int);
2267+
2268+ spin_lock_bh(&sock_tag_list_lock);
2269+ spin_lock_bh(&uid_tag_data_tree_lock);
2270+ sock_tag_entry = get_sock_stat_nl(el_socket->sk);
2271+ tag_ref_entry = get_tag_ref(full_tag, &uid_tag_data_entry);
2272+ if (IS_ERR(tag_ref_entry)) {
2273+ res = PTR_ERR(tag_ref_entry);
2274+ spin_unlock_bh(&uid_tag_data_tree_lock);
2275+ spin_unlock_bh(&sock_tag_list_lock);
2276+ goto err_put;
2277+ }
2278+ tag_ref_entry->num_sock_tags++;
2279+ if (sock_tag_entry) {
2280+ struct tag_ref *prev_tag_ref_entry;
2281+
2282+ CT_DEBUG("qtaguid: ctrl_tag(%s): retag for sk=%p "
2283+ "st@%p ...->sk_refcnt=%d\n",
2284+ input, el_socket->sk, sock_tag_entry,
2285+ refcount_read(&el_socket->sk->sk_refcnt));
2286+ prev_tag_ref_entry = lookup_tag_ref(sock_tag_entry->tag,
2287+ &uid_tag_data_entry);
2288+ BUG_ON(IS_ERR_OR_NULL(prev_tag_ref_entry));
2289+ BUG_ON(prev_tag_ref_entry->num_sock_tags <= 0);
2290+ prev_tag_ref_entry->num_sock_tags--;
2291+ sock_tag_entry->tag = full_tag;
2292+ } else {
2293+ CT_DEBUG("qtaguid: ctrl_tag(%s): newtag for sk=%p\n",
2294+ input, el_socket->sk);
2295+ sock_tag_entry = kzalloc(sizeof(*sock_tag_entry),
2296+ GFP_ATOMIC);
2297+ if (!sock_tag_entry) {
2298+ pr_err("qtaguid: ctrl_tag(%s): "
2299+ "socket tag alloc failed\n",
2300+ input);
2301+ BUG_ON(tag_ref_entry->num_sock_tags <= 0);
2302+ tag_ref_entry->num_sock_tags--;
2303+ free_tag_ref_from_utd_entry(tag_ref_entry,
2304+ uid_tag_data_entry);
2305+ spin_unlock_bh(&uid_tag_data_tree_lock);
2306+ spin_unlock_bh(&sock_tag_list_lock);
2307+ res = -ENOMEM;
2308+ goto err_put;
2309+ }
2310+ /*
2311+ * Hold the sk refcount here to make sure the sk pointer cannot
2312+ * be freed and reused
2313+ */
2314+ sock_hold(el_socket->sk);
2315+ sock_tag_entry->sk = el_socket->sk;
2316+ sock_tag_entry->pid = current->tgid;
2317+ sock_tag_entry->tag = combine_atag_with_uid(acct_tag, uid_int);
2318+ pqd_entry = proc_qtu_data_tree_search(
2319+ &proc_qtu_data_tree, current->tgid);
2320+ /*
2321+ * TODO: remove if, and start failing.
2322+ * At first, we want to catch user-space code that is not
2323+ * opening the /dev/xt_qtaguid.
2324+ */
2325+ if (IS_ERR_OR_NULL(pqd_entry))
2326+ pr_warn_once(
2327+ "qtaguid: %s(): "
2328+ "User space forgot to open /dev/xt_qtaguid? "
2329+ "pid=%u tgid=%u uid=%u\n", __func__,
2330+ current->pid, current->tgid,
2331+ from_kuid(&init_user_ns, current_fsuid()));
2332+ else
2333+ list_add(&sock_tag_entry->list,
2334+ &pqd_entry->sock_tag_list);
2335+
2336+ sock_tag_tree_insert(sock_tag_entry, &sock_tag_tree);
2337+ atomic64_inc(&qtu_events.sockets_tagged);
2338+ }
2339+ spin_unlock_bh(&uid_tag_data_tree_lock);
2340+ spin_unlock_bh(&sock_tag_list_lock);
2341+ /* We keep the ref to the sk until it is untagged */
2342+ CT_DEBUG("qtaguid: ctrl_tag(%s): done st@%p ...->sk_refcnt=%d\n",
2343+ input, sock_tag_entry,
2344+ refcount_read(&el_socket->sk->sk_refcnt));
2345+ sockfd_put(el_socket);
2346+ return 0;
2347+
2348+err_put:
2349+ CT_DEBUG("qtaguid: ctrl_tag(%s): done. ...->sk_refcnt=%d\n",
2350+ input, refcount_read(&el_socket->sk->sk_refcnt) - 1);
2351+ /* Release the sock_fd that was grabbed by sockfd_lookup(). */
2352+ sockfd_put(el_socket);
2353+ return res;
2354+
2355+err:
2356+ CT_DEBUG("qtaguid: ctrl_tag(%s): done.\n", input);
2357+ return res;
2358+}
2359+
2360+static int ctrl_cmd_untag(const char *input)
2361+{
2362+ char cmd;
2363+ int sock_fd = 0;
2364+ struct socket *el_socket;
2365+ int res, argc;
2366+
2367+ argc = sscanf(input, "%c %d", &cmd, &sock_fd);
2368+ CT_DEBUG("qtaguid: ctrl_untag(%s): argc=%d cmd=%c sock_fd=%d\n",
2369+ input, argc, cmd, sock_fd);
2370+ if (argc < 2) {
2371+ res = -EINVAL;
2372+ return res;
2373+ }
2374+ el_socket = sockfd_lookup(sock_fd, &res); /* This locks the file */
2375+ if (!el_socket) {
2376+ pr_info("qtaguid: ctrl_untag(%s): failed to lookup"
2377+ " sock_fd=%d err=%d pid=%u tgid=%u uid=%u\n",
2378+ input, sock_fd, res, current->pid, current->tgid,
2379+ from_kuid(&init_user_ns, current_fsuid()));
2380+ return res;
2381+ }
2382+ CT_DEBUG("qtaguid: ctrl_untag(%s): socket->...->f_count=%ld ->sk=%p\n",
2383+ input, atomic_long_read(&el_socket->file->f_count),
2384+ el_socket->sk);
2385+ res = qtaguid_untag(el_socket, false);
2386+ sockfd_put(el_socket);
2387+ return res;
2388+}
2389+
2390+int qtaguid_untag(struct socket *el_socket, bool kernel)
2391+{
2392+ int res;
2393+ pid_t pid;
2394+ struct sock_tag *sock_tag_entry;
2395+ struct tag_ref *tag_ref_entry;
2396+ struct uid_tag_data *utd_entry;
2397+ struct proc_qtu_data *pqd_entry;
2398+
2399+ spin_lock_bh(&sock_tag_list_lock);
2400+ sock_tag_entry = get_sock_stat_nl(el_socket->sk);
2401+ if (!sock_tag_entry) {
2402+ spin_unlock_bh(&sock_tag_list_lock);
2403+ res = -EINVAL;
2404+ return res;
2405+ }
2406+ /*
2407+ * The socket already belongs to the current process
2408+ * so it can do whatever it wants to it.
2409+ */
2410+ rb_erase(&sock_tag_entry->sock_node, &sock_tag_tree);
2411+
2412+ tag_ref_entry = lookup_tag_ref(sock_tag_entry->tag, &utd_entry);
2413+ BUG_ON(!tag_ref_entry);
2414+ BUG_ON(tag_ref_entry->num_sock_tags <= 0);
2415+ spin_lock_bh(&uid_tag_data_tree_lock);
2416+ if (kernel)
2417+ pid = sock_tag_entry->pid;
2418+ else
2419+ pid = current->tgid;
2420+ pqd_entry = proc_qtu_data_tree_search(
2421+ &proc_qtu_data_tree, pid);
2422+ /*
2423+ * TODO: remove if, and start failing.
2424+ * At first, we want to catch user-space code that is not
2425+ * opening the /dev/xt_qtaguid.
2426+ */
2427+ if (IS_ERR_OR_NULL(pqd_entry) || !sock_tag_entry->list.next) {
2428+ pr_warn_once("qtaguid: %s(): "
2429+ "User space forgot to open /dev/xt_qtaguid? "
2430+ "pid=%u tgid=%u sk_pid=%u, uid=%u\n", __func__,
2431+ current->pid, current->tgid, sock_tag_entry->pid,
2432+ from_kuid(&init_user_ns, current_fsuid()));
2433+ } else {
2434+ list_del(&sock_tag_entry->list);
2435+ }
2436+ spin_unlock_bh(&uid_tag_data_tree_lock);
2437+ /*
2438+ * We don't free tag_ref from the utd_entry here,
2439+ * only during a cmd_delete().
2440+ */
2441+ tag_ref_entry->num_sock_tags--;
2442+ spin_unlock_bh(&sock_tag_list_lock);
2443+ /*
2444+ * Release the sock_fd that was grabbed at tag time.
2445+ */
2446+ sock_put(sock_tag_entry->sk);
2447+ CT_DEBUG("qtaguid: done. st@%p ...->sk_refcnt=%d\n",
2448+ sock_tag_entry,
2449+ refcount_read(&el_socket->sk->sk_refcnt));
2450+
2451+ kfree(sock_tag_entry);
2452+ atomic64_inc(&qtu_events.sockets_untagged);
2453+
2454+ return 0;
2455+}
2456+
2457+static ssize_t qtaguid_ctrl_parse(const char *input, size_t count)
2458+{
2459+ char cmd;
2460+ ssize_t res;
2461+
2462+ CT_DEBUG("qtaguid: ctrl(%s): pid=%u tgid=%u uid=%u\n",
2463+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2464+
2465+ cmd = input[0];
2466+ /* Collect params for commands */
2467+ switch (cmd) {
2468+ case 'd':
2469+ res = ctrl_cmd_delete(input);
2470+ break;
2471+
2472+ case 's':
2473+ res = ctrl_cmd_counter_set(input);
2474+ break;
2475+
2476+ case 't':
2477+ res = ctrl_cmd_tag(input);
2478+ break;
2479+
2480+ case 'u':
2481+ res = ctrl_cmd_untag(input);
2482+ break;
2483+
2484+ default:
2485+ res = -EINVAL;
2486+ goto err;
2487+ }
2488+ if (!res)
2489+ res = count;
2490+err:
2491+ CT_DEBUG("qtaguid: ctrl(%s): res=%zd\n", input, res);
2492+ return res;
2493+}
2494+
2495+#define MAX_QTAGUID_CTRL_INPUT_LEN 255
2496+static ssize_t qtaguid_ctrl_proc_write(struct file *file, const char __user *buffer,
2497+ size_t count, loff_t *offp)
2498+{
2499+ char input_buf[MAX_QTAGUID_CTRL_INPUT_LEN];
2500+
2501+ if (unlikely(module_passive))
2502+ return count;
2503+
2504+ if (count >= MAX_QTAGUID_CTRL_INPUT_LEN)
2505+ return -EINVAL;
2506+
2507+ if (copy_from_user(input_buf, buffer, count))
2508+ return -EFAULT;
2509+
2510+ input_buf[count] = '\0';
2511+ return qtaguid_ctrl_parse(input_buf, count);
2512+}
2513+
2514+struct proc_print_info {
2515+ struct iface_stat *iface_entry;
2516+ int item_index;
2517+ tag_t tag; /* tag found by reading to tag_pos */
2518+ off_t tag_pos;
2519+ int tag_item_index;
2520+};
2521+
2522+static void pp_stats_header(struct seq_file *m)
2523+{
2524+ seq_puts(m,
2525+ "idx iface acct_tag_hex uid_tag_int cnt_set "
2526+ "rx_bytes rx_packets "
2527+ "tx_bytes tx_packets "
2528+ "rx_tcp_bytes rx_tcp_packets "
2529+ "rx_udp_bytes rx_udp_packets "
2530+ "rx_other_bytes rx_other_packets "
2531+ "tx_tcp_bytes tx_tcp_packets "
2532+ "tx_udp_bytes tx_udp_packets "
2533+ "tx_other_bytes tx_other_packets\n");
2534+}
2535+
2536+static int pp_stats_line(struct seq_file *m, struct tag_stat *ts_entry,
2537+ int cnt_set)
2538+{
2539+ struct data_counters *cnts;
2540+ tag_t tag = ts_entry->tn.tag;
2541+ uid_t stat_uid = get_uid_from_tag(tag);
2542+ struct proc_print_info *ppi = m->private;
2543+ /* Detailed tags are not available to everybody */
2544+ if (!can_read_other_uid_stats(make_kuid(&init_user_ns,stat_uid))) {
2545+ CT_DEBUG("qtaguid: stats line: "
2546+ "%s 0x%llx %u: insufficient priv "
2547+ "from pid=%u tgid=%u uid=%u stats.gid=%u\n",
2548+ ppi->iface_entry->ifname,
2549+ get_atag_from_tag(tag), stat_uid,
2550+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()),
2551+ from_kgid(&init_user_ns,xt_qtaguid_stats_file->gid));
2552+ return 0;
2553+ }
2554+ ppi->item_index++;
2555+ cnts = &ts_entry->counters;
2556+ seq_printf(m, "%d %s 0x%llx %u %u "
2557+ "%llu %llu "
2558+ "%llu %llu "
2559+ "%llu %llu "
2560+ "%llu %llu "
2561+ "%llu %llu "
2562+ "%llu %llu "
2563+ "%llu %llu "
2564+ "%llu %llu\n",
2565+ ppi->item_index,
2566+ ppi->iface_entry->ifname,
2567+ get_atag_from_tag(tag),
2568+ stat_uid,
2569+ cnt_set,
2570+ dc_sum_bytes(cnts, cnt_set, IFS_RX),
2571+ dc_sum_packets(cnts, cnt_set, IFS_RX),
2572+ dc_sum_bytes(cnts, cnt_set, IFS_TX),
2573+ dc_sum_packets(cnts, cnt_set, IFS_TX),
2574+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].bytes,
2575+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].packets,
2576+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].bytes,
2577+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].packets,
2578+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].bytes,
2579+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].packets,
2580+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].bytes,
2581+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].packets,
2582+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].bytes,
2583+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].packets,
2584+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].bytes,
2585+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].packets);
2586+ return seq_has_overflowed(m) ? -ENOSPC : 1;
2587+}
2588+
2589+static bool pp_sets(struct seq_file *m, struct tag_stat *ts_entry)
2590+{
2591+ int ret;
2592+ int counter_set;
2593+ for (counter_set = 0; counter_set < IFS_MAX_COUNTER_SETS;
2594+ counter_set++) {
2595+ ret = pp_stats_line(m, ts_entry, counter_set);
2596+ if (ret < 0)
2597+ return false;
2598+ }
2599+ return true;
2600+}
2601+
2602+static int qtaguid_stats_proc_iface_stat_ptr_valid(struct iface_stat *ptr)
2603+{
2604+ struct iface_stat *iface_entry;
2605+
2606+ if (!ptr)
2607+ return false;
2608+
2609+ list_for_each_entry(iface_entry, &iface_stat_list, list)
2610+ if (iface_entry == ptr)
2611+ return true;
2612+ return false;
2613+}
2614+
2615+static void qtaguid_stats_proc_next_iface_entry(struct proc_print_info *ppi)
2616+{
2617+ spin_unlock_bh(&ppi->iface_entry->tag_stat_list_lock);
2618+ list_for_each_entry_continue(ppi->iface_entry, &iface_stat_list, list) {
2619+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
2620+ return;
2621+ }
2622+ ppi->iface_entry = NULL;
2623+}
2624+
2625+static void *qtaguid_stats_proc_next(struct seq_file *m, void *v, loff_t *pos)
2626+{
2627+ struct proc_print_info *ppi = m->private;
2628+ struct tag_stat *ts_entry;
2629+ struct rb_node *node;
2630+
2631+ if (!v) {
2632+ pr_err("qtaguid: %s(): unexpected v: NULL\n", __func__);
2633+ return NULL;
2634+ }
2635+
2636+ (*pos)++;
2637+
2638+ if (!ppi->iface_entry || unlikely(module_passive))
2639+ return NULL;
2640+
2641+ if (v == SEQ_START_TOKEN)
2642+ node = rb_first(&ppi->iface_entry->tag_stat_tree);
2643+ else
2644+ node = rb_next(&((struct tag_stat *)v)->tn.node);
2645+
2646+ while (!node) {
2647+ qtaguid_stats_proc_next_iface_entry(ppi);
2648+ if (!ppi->iface_entry)
2649+ return NULL;
2650+ node = rb_first(&ppi->iface_entry->tag_stat_tree);
2651+ }
2652+
2653+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
2654+ ppi->tag = ts_entry->tn.tag;
2655+ ppi->tag_pos = *pos;
2656+ ppi->tag_item_index = ppi->item_index;
2657+ return ts_entry;
2658+}
2659+
2660+static void *qtaguid_stats_proc_start(struct seq_file *m, loff_t *pos)
2661+{
2662+ struct proc_print_info *ppi = m->private;
2663+ struct tag_stat *ts_entry = NULL;
2664+
2665+ spin_lock_bh(&iface_stat_list_lock);
2666+
2667+ if (*pos == 0) {
2668+ ppi->item_index = 1;
2669+ ppi->tag_pos = 0;
2670+ if (list_empty(&iface_stat_list)) {
2671+ ppi->iface_entry = NULL;
2672+ } else {
2673+ ppi->iface_entry = list_first_entry(&iface_stat_list,
2674+ struct iface_stat,
2675+ list);
2676+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
2677+ }
2678+ return SEQ_START_TOKEN;
2679+ }
2680+ if (!qtaguid_stats_proc_iface_stat_ptr_valid(ppi->iface_entry)) {
2681+ if (ppi->iface_entry) {
2682+ pr_err("qtaguid: %s(): iface_entry %p not found\n",
2683+ __func__, ppi->iface_entry);
2684+ ppi->iface_entry = NULL;
2685+ }
2686+ return NULL;
2687+ }
2688+
2689+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
2690+
2691+ if (!ppi->tag_pos) {
2692+ /* seq_read skipped first next call */
2693+ ts_entry = SEQ_START_TOKEN;
2694+ } else {
2695+ ts_entry = tag_stat_tree_search(
2696+ &ppi->iface_entry->tag_stat_tree, ppi->tag);
2697+ if (!ts_entry) {
2698+ pr_info("qtaguid: %s(): tag_stat.tag 0x%llx not found. Abort.\n",
2699+ __func__, ppi->tag);
2700+ return NULL;
2701+ }
2702+ }
2703+
2704+ if (*pos == ppi->tag_pos) { /* normal resume */
2705+ ppi->item_index = ppi->tag_item_index;
2706+ } else {
2707+ /* seq_read skipped a next call */
2708+ *pos = ppi->tag_pos;
2709+ ts_entry = qtaguid_stats_proc_next(m, ts_entry, pos);
2710+ }
2711+
2712+ return ts_entry;
2713+}
2714+
2715+static void qtaguid_stats_proc_stop(struct seq_file *m, void *v)
2716+{
2717+ struct proc_print_info *ppi = m->private;
2718+ if (ppi->iface_entry)
2719+ spin_unlock_bh(&ppi->iface_entry->tag_stat_list_lock);
2720+ spin_unlock_bh(&iface_stat_list_lock);
2721+}
2722+
2723+/*
2724+ * Procfs reader to get all tag stats using style "1)" as described in
2725+ * fs/proc/generic.c
2726+ * Groups all protocols tx/rx bytes.
2727+ */
2728+static int qtaguid_stats_proc_show(struct seq_file *m, void *v)
2729+{
2730+ struct tag_stat *ts_entry = v;
2731+
2732+ if (v == SEQ_START_TOKEN)
2733+ pp_stats_header(m);
2734+ else
2735+ pp_sets(m, ts_entry);
2736+
2737+ return 0;
2738+}
2739+
2740+/*------------------------------------------*/
2741+static int qtudev_open(struct inode *inode, struct file *file)
2742+{
2743+ struct uid_tag_data *utd_entry;
2744+ struct proc_qtu_data *pqd_entry;
2745+ struct proc_qtu_data *new_pqd_entry;
2746+ int res;
2747+ bool utd_entry_found;
2748+
2749+ if (unlikely(qtu_proc_handling_passive))
2750+ return 0;
2751+
2752+ DR_DEBUG("qtaguid: qtudev_open(): pid=%u tgid=%u uid=%u\n",
2753+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2754+
2755+ spin_lock_bh(&uid_tag_data_tree_lock);
2756+
2757+ /* Look for existing uid data, or alloc one. */
2758+ utd_entry = get_uid_data(from_kuid(&init_user_ns, current_fsuid()), &utd_entry_found);
2759+ if (IS_ERR_OR_NULL(utd_entry)) {
2760+ res = PTR_ERR(utd_entry);
2761+ goto err_unlock;
2762+ }
2763+
2764+ /* Look for existing PID based proc_data */
2765+ pqd_entry = proc_qtu_data_tree_search(&proc_qtu_data_tree,
2766+ current->tgid);
2767+ if (pqd_entry) {
2768+ pr_err("qtaguid: qtudev_open(): %u/%u %u "
2769+ "%s already opened\n",
2770+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()),
2771+ QTU_DEV_NAME);
2772+ res = -EBUSY;
2773+ goto err_unlock_free_utd;
2774+ }
2775+
2776+ new_pqd_entry = kzalloc(sizeof(*new_pqd_entry), GFP_ATOMIC);
2777+ if (!new_pqd_entry) {
2778+ pr_err("qtaguid: qtudev_open(): %u/%u %u: "
2779+ "proc data alloc failed\n",
2780+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
2781+ res = -ENOMEM;
2782+ goto err_unlock_free_utd;
2783+ }
2784+ new_pqd_entry->pid = current->tgid;
2785+ INIT_LIST_HEAD(&new_pqd_entry->sock_tag_list);
2786+ new_pqd_entry->parent_tag_data = utd_entry;
2787+ utd_entry->num_pqd++;
2788+
2789+ proc_qtu_data_tree_insert(new_pqd_entry,
2790+ &proc_qtu_data_tree);
2791+
2792+ spin_unlock_bh(&uid_tag_data_tree_lock);
2793+ DR_DEBUG("qtaguid: tracking data for uid=%u in pqd=%p\n",
2794+ from_kuid(&init_user_ns, current_fsuid()), new_pqd_entry);
2795+ file->private_data = new_pqd_entry;
2796+ return 0;
2797+
2798+err_unlock_free_utd:
2799+ if (!utd_entry_found) {
2800+ rb_erase(&utd_entry->node, &uid_tag_data_tree);
2801+ kfree(utd_entry);
2802+ }
2803+err_unlock:
2804+ spin_unlock_bh(&uid_tag_data_tree_lock);
2805+ return res;
2806+}
2807+
2808+static int qtudev_release(struct inode *inode, struct file *file)
2809+{
2810+ struct proc_qtu_data *pqd_entry = file->private_data;
2811+ struct uid_tag_data *utd_entry = pqd_entry->parent_tag_data;
2812+ struct sock_tag *st_entry;
2813+ struct rb_root st_to_free_tree = RB_ROOT;
2814+ struct list_head *entry, *next;
2815+ struct tag_ref *tr;
2816+
2817+ if (unlikely(qtu_proc_handling_passive))
2818+ return 0;
2819+
2820+ /*
2821+ * Do not trust the current->pid, it might just be a kworker cleaning
2822+ * up after a dead proc.
2823+ */
2824+ DR_DEBUG("qtaguid: qtudev_release(): "
2825+ "pid=%u tgid=%u uid=%u "
2826+ "pqd_entry=%p->pid=%u utd_entry=%p->active_tags=%d\n",
2827+ current->pid, current->tgid, pqd_entry->parent_tag_data->uid,
2828+ pqd_entry, pqd_entry->pid, utd_entry,
2829+ utd_entry->num_active_tags);
2830+
2831+ spin_lock_bh(&sock_tag_list_lock);
2832+ spin_lock_bh(&uid_tag_data_tree_lock);
2833+
2834+ list_for_each_safe(entry, next, &pqd_entry->sock_tag_list) {
2835+ st_entry = list_entry(entry, struct sock_tag, list);
2836+ DR_DEBUG("qtaguid: %s(): "
2837+ "erase sock_tag=%p->sk=%p pid=%u tgid=%u uid=%u\n",
2838+ __func__,
2839+ st_entry, st_entry->sk,
2840+ current->pid, current->tgid,
2841+ pqd_entry->parent_tag_data->uid);
2842+
2843+ utd_entry = uid_tag_data_tree_search(
2844+ &uid_tag_data_tree,
2845+ get_uid_from_tag(st_entry->tag));
2846+ BUG_ON(IS_ERR_OR_NULL(utd_entry));
2847+ DR_DEBUG("qtaguid: %s(): "
2848+ "looking for tag=0x%llx in utd_entry=%p\n", __func__,
2849+ st_entry->tag, utd_entry);
2850+ tr = tag_ref_tree_search(&utd_entry->tag_ref_tree,
2851+ st_entry->tag);
2852+ BUG_ON(!tr);
2853+ BUG_ON(tr->num_sock_tags <= 0);
2854+ tr->num_sock_tags--;
2855+ free_tag_ref_from_utd_entry(tr, utd_entry);
2856+
2857+ rb_erase(&st_entry->sock_node, &sock_tag_tree);
2858+ list_del(&st_entry->list);
2859+ /* Can't sockfd_put() within spinlock, do it later. */
2860+ sock_tag_tree_insert(st_entry, &st_to_free_tree);
2861+
2862+ /*
2863+ * Try to free the utd_entry if no other proc_qtu_data is
2864+ * using it (num_pqd is 0) and it doesn't have active tags
2865+ * (num_active_tags is 0).
2866+ */
2867+ put_utd_entry(utd_entry);
2868+ }
2869+
2870+ rb_erase(&pqd_entry->node, &proc_qtu_data_tree);
2871+ BUG_ON(pqd_entry->parent_tag_data->num_pqd < 1);
2872+ pqd_entry->parent_tag_data->num_pqd--;
2873+ put_utd_entry(pqd_entry->parent_tag_data);
2874+ kfree(pqd_entry);
2875+ file->private_data = NULL;
2876+
2877+ spin_unlock_bh(&uid_tag_data_tree_lock);
2878+ spin_unlock_bh(&sock_tag_list_lock);
2879+
2880+
2881+ sock_tag_tree_erase(&st_to_free_tree);
2882+
2883+ spin_lock_bh(&sock_tag_list_lock);
2884+ prdebug_full_state_locked(0, "%s(): pid=%u tgid=%u", __func__,
2885+ current->pid, current->tgid);
2886+ spin_unlock_bh(&sock_tag_list_lock);
2887+ return 0;
2888+}
2889+
2890+/*------------------------------------------*/
2891+static const struct file_operations qtudev_fops = {
2892+ .owner = THIS_MODULE,
2893+ .open = qtudev_open,
2894+ .release = qtudev_release,
2895+};
2896+
2897+static struct miscdevice qtu_device = {
2898+ .minor = MISC_DYNAMIC_MINOR,
2899+ .name = QTU_DEV_NAME,
2900+ .fops = &qtudev_fops,
2901+ /* How sad it doesn't allow for defaults: .mode = S_IRUGO | S_IWUSR */
2902+};
2903+
2904+static const struct seq_operations proc_qtaguid_ctrl_seqops = {
2905+ .start = qtaguid_ctrl_proc_start,
2906+ .next = qtaguid_ctrl_proc_next,
2907+ .stop = qtaguid_ctrl_proc_stop,
2908+ .show = qtaguid_ctrl_proc_show,
2909+};
2910+
2911+static int proc_qtaguid_ctrl_open(struct inode *inode, struct file *file)
2912+{
2913+ return seq_open_private(file, &proc_qtaguid_ctrl_seqops,
2914+ sizeof(struct proc_ctrl_print_info));
2915+}
2916+
2917+static const struct file_operations proc_qtaguid_ctrl_fops = {
2918+ .open = proc_qtaguid_ctrl_open,
2919+ .read = seq_read,
2920+ .write = qtaguid_ctrl_proc_write,
2921+ .llseek = seq_lseek,
2922+ .release = seq_release_private,
2923+};
2924+
2925+static const struct seq_operations proc_qtaguid_stats_seqops = {
2926+ .start = qtaguid_stats_proc_start,
2927+ .next = qtaguid_stats_proc_next,
2928+ .stop = qtaguid_stats_proc_stop,
2929+ .show = qtaguid_stats_proc_show,
2930+};
2931+
2932+static int proc_qtaguid_stats_open(struct inode *inode, struct file *file)
2933+{
2934+ return seq_open_private(file, &proc_qtaguid_stats_seqops,
2935+ sizeof(struct proc_print_info));
2936+}
2937+
2938+static const struct file_operations proc_qtaguid_stats_fops = {
2939+ .open = proc_qtaguid_stats_open,
2940+ .read = seq_read,
2941+ .llseek = seq_lseek,
2942+ .release = seq_release_private,
2943+};
2944+
2945+/*------------------------------------------*/
2946+static int __init qtaguid_proc_register(struct proc_dir_entry **res_procdir)
2947+{
2948+ int ret;
2949+ *res_procdir = proc_mkdir(module_procdirname, init_net.proc_net);
2950+ if (!*res_procdir) {
2951+ pr_err("qtaguid: failed to create proc/.../xt_qtaguid\n");
2952+ ret = -ENOMEM;
2953+ goto no_dir;
2954+ }
2955+
2956+ xt_qtaguid_ctrl_file = proc_create_data("ctrl", proc_ctrl_perms,
2957+ *res_procdir,
2958+ &proc_qtaguid_ctrl_fops,
2959+ NULL);
2960+ if (!xt_qtaguid_ctrl_file) {
2961+ pr_err("qtaguid: failed to create xt_qtaguid/ctrl "
2962+ " file\n");
2963+ ret = -ENOMEM;
2964+ goto no_ctrl_entry;
2965+ }
2966+
2967+ xt_qtaguid_stats_file = proc_create_data("stats", proc_stats_perms,
2968+ *res_procdir,
2969+ &proc_qtaguid_stats_fops,
2970+ NULL);
2971+ if (!xt_qtaguid_stats_file) {
2972+ pr_err("qtaguid: failed to create xt_qtaguid/stats "
2973+ "file\n");
2974+ ret = -ENOMEM;
2975+ goto no_stats_entry;
2976+ }
2977+ /*
2978+ * TODO: add support counter hacking
2979+ * xt_qtaguid_stats_file->write_proc = qtaguid_stats_proc_write;
2980+ */
2981+ return 0;
2982+
2983+no_stats_entry:
2984+ remove_proc_entry("ctrl", *res_procdir);
2985+no_ctrl_entry:
2986+ remove_proc_entry("xt_qtaguid", NULL);
2987+no_dir:
2988+ return ret;
2989+}
2990+
2991+static struct xt_match qtaguid_mt_reg __read_mostly = {
2992+ /*
2993+ * This module masquerades as the "owner" module so that iptables
2994+ * tools can deal with it.
2995+ */
2996+ .name = "owner",
2997+ .revision = 1,
2998+ .family = NFPROTO_UNSPEC,
2999+ .match = qtaguid_mt,
3000+ .matchsize = sizeof(struct xt_qtaguid_match_info),
3001+ .me = THIS_MODULE,
3002+};
3003+
3004+static int __init qtaguid_mt_init(void)
3005+{
3006+ if (qtaguid_proc_register(&xt_qtaguid_procdir)
3007+ || iface_stat_init(xt_qtaguid_procdir)
3008+ || xt_register_match(&qtaguid_mt_reg)
3009+ || misc_register(&qtu_device))
3010+ return -1;
3011+ return 0;
3012+}
3013+
3014+/*
3015+ * TODO: allow unloading of the module.
3016+ * For now stats are permanent.
3017+ * Kconfig forces'y/n' and never an 'm'.
3018+ */
3019+
3020+module_init(qtaguid_mt_init);
3021+MODULE_AUTHOR("jpa <jpa@google.com>");
3022+MODULE_DESCRIPTION("Xtables: socket owner+tag matching and associated stats");
3023+MODULE_LICENSE("GPL");
3024+MODULE_ALIAS("ipt_owner");
3025+MODULE_ALIAS("ip6t_owner");
3026+MODULE_ALIAS("ipt_qtaguid");
3027+MODULE_ALIAS("ip6t_qtaguid");
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_internal.h
@@ -0,0 +1,350 @@
1+/*
2+ * Kernel iptables module to track stats for packets based on user tags.
3+ *
4+ * (C) 2011 Google, Inc
5+ *
6+ * This program is free software; you can redistribute it and/or modify
7+ * it under the terms of the GNU General Public License version 2 as
8+ * published by the Free Software Foundation.
9+ */
10+#ifndef __XT_QTAGUID_INTERNAL_H__
11+#define __XT_QTAGUID_INTERNAL_H__
12+
13+#include <linux/types.h>
14+#include <linux/rbtree.h>
15+#include <linux/spinlock_types.h>
16+#include <linux/workqueue.h>
17+
18+/* Iface handling */
19+#define IDEBUG_MASK (1<<0)
20+/* Iptable Matching. Per packet. */
21+#define MDEBUG_MASK (1<<1)
22+/* Red-black tree handling. Per packet. */
23+#define RDEBUG_MASK (1<<2)
24+/* procfs ctrl/stats handling */
25+#define CDEBUG_MASK (1<<3)
26+/* dev and resource tracking */
27+#define DDEBUG_MASK (1<<4)
28+
29+/* E.g (IDEBUG_MASK | CDEBUG_MASK | DDEBUG_MASK) */
30+#define DEFAULT_DEBUG_MASK 0
31+
32+/*
33+ * (Un)Define these *DEBUG to compile out/in the pr_debug calls.
34+ * All undef: text size ~ 0x3030; all def: ~ 0x4404.
35+ */
36+#define IDEBUG
37+#define MDEBUG
38+#define RDEBUG
39+#define CDEBUG
40+#define DDEBUG
41+
42+#define MSK_DEBUG(mask, ...) do { \
43+ if (unlikely(qtaguid_debug_mask & (mask))) \
44+ pr_debug(__VA_ARGS__); \
45+ } while (0)
46+#ifdef IDEBUG
47+#define IF_DEBUG(...) MSK_DEBUG(IDEBUG_MASK, __VA_ARGS__)
48+#else
49+#define IF_DEBUG(...) no_printk(__VA_ARGS__)
50+#endif
51+#ifdef MDEBUG
52+#define MT_DEBUG(...) MSK_DEBUG(MDEBUG_MASK, __VA_ARGS__)
53+#else
54+#define MT_DEBUG(...) no_printk(__VA_ARGS__)
55+#endif
56+#ifdef RDEBUG
57+#define RB_DEBUG(...) MSK_DEBUG(RDEBUG_MASK, __VA_ARGS__)
58+#else
59+#define RB_DEBUG(...) no_printk(__VA_ARGS__)
60+#endif
61+#ifdef CDEBUG
62+#define CT_DEBUG(...) MSK_DEBUG(CDEBUG_MASK, __VA_ARGS__)
63+#else
64+#define CT_DEBUG(...) no_printk(__VA_ARGS__)
65+#endif
66+#ifdef DDEBUG
67+#define DR_DEBUG(...) MSK_DEBUG(DDEBUG_MASK, __VA_ARGS__)
68+#else
69+#define DR_DEBUG(...) no_printk(__VA_ARGS__)
70+#endif
71+
72+extern uint qtaguid_debug_mask;
73+
74+/*---------------------------------------------------------------------------*/
75+/*
76+ * Tags:
77+ *
78+ * They represent what the data usage counters will be tracked against.
79+ * By default a tag is just based on the UID.
80+ * The UID is used as the base for policing, and can not be ignored.
81+ * So a tag will always at least represent a UID (uid_tag).
82+ *
83+ * A tag can be augmented with an "accounting tag" which is associated
84+ * with a UID.
85+ * User space can set the acct_tag portion of the tag which is then used
86+ * with sockets: all data belonging to that socket will be counted against the
87+ * tag. The policing is then based on the tag's uid_tag portion,
88+ * and stats are collected for the acct_tag portion separately.
89+ *
90+ * There could be
91+ * a: {acct_tag=1, uid_tag=10003}
92+ * b: {acct_tag=2, uid_tag=10003}
93+ * c: {acct_tag=3, uid_tag=10003}
94+ * d: {acct_tag=0, uid_tag=10003}
95+ * a, b, and c represent tags associated with specific sockets.
96+ * d is for the totals for that uid, including all untagged traffic.
97+ * Typically d is used with policing/quota rules.
98+ *
99+ * We want tag_t big enough to distinguish uid_t and acct_tag.
100+ * It might become a struct if needed.
101+ * Nothing should be using it as an int.
102+ */
103+typedef uint64_t tag_t; /* Only used via accessors */
104+
105+#define TAG_UID_MASK 0xFFFFFFFFULL
106+#define TAG_ACCT_MASK (~0xFFFFFFFFULL)
107+
108+static inline int tag_compare(tag_t t1, tag_t t2)
109+{
110+ return t1 < t2 ? -1 : t1 == t2 ? 0 : 1;
111+}
112+
113+static inline tag_t combine_atag_with_uid(tag_t acct_tag, uid_t uid)
114+{
115+ return acct_tag | uid;
116+}
117+static inline tag_t make_tag_from_uid(uid_t uid)
118+{
119+ return uid;
120+}
121+static inline uid_t get_uid_from_tag(tag_t tag)
122+{
123+ return tag & TAG_UID_MASK;
124+}
125+static inline tag_t get_utag_from_tag(tag_t tag)
126+{
127+ return tag & TAG_UID_MASK;
128+}
129+static inline tag_t get_atag_from_tag(tag_t tag)
130+{
131+ return tag & TAG_ACCT_MASK;
132+}
133+
134+static inline bool valid_atag(tag_t tag)
135+{
136+ return !(tag & TAG_UID_MASK);
137+}
138+static inline tag_t make_atag_from_value(uint32_t value)
139+{
140+ return (uint64_t)value << 32;
141+}
142+/*---------------------------------------------------------------------------*/
143+
144+/*
145+ * Maximum number of socket tags that a UID is allowed to have active.
146+ * Multiple processes belonging to the same UID contribute towards this limit.
147+ * Special UIDs that can impersonate a UID also contribute (e.g. download
148+ * manager, ...)
149+ */
150+#define DEFAULT_MAX_SOCK_TAGS 1024
151+
152+/*
153+ * For now we only track 2 sets of counters.
154+ * The default set is 0.
155+ * Userspace can activate another set for a given uid being tracked.
156+ */
157+#define IFS_MAX_COUNTER_SETS 2
158+
159+enum ifs_tx_rx {
160+ IFS_TX,
161+ IFS_RX,
162+ IFS_MAX_DIRECTIONS
163+};
164+
165+/* For now, TCP, UDP, the rest */
166+enum ifs_proto {
167+ IFS_TCP,
168+ IFS_UDP,
169+ IFS_PROTO_OTHER,
170+ IFS_MAX_PROTOS
171+};
172+
173+struct byte_packet_counters {
174+ uint64_t bytes;
175+ uint64_t packets;
176+};
177+
178+struct data_counters {
179+ struct byte_packet_counters bpc[IFS_MAX_COUNTER_SETS][IFS_MAX_DIRECTIONS][IFS_MAX_PROTOS];
180+};
181+
182+static inline uint64_t dc_sum_bytes(struct data_counters *counters,
183+ int set,
184+ enum ifs_tx_rx direction)
185+{
186+ return counters->bpc[set][direction][IFS_TCP].bytes
187+ + counters->bpc[set][direction][IFS_UDP].bytes
188+ + counters->bpc[set][direction][IFS_PROTO_OTHER].bytes;
189+}
190+
191+static inline uint64_t dc_sum_packets(struct data_counters *counters,
192+ int set,
193+ enum ifs_tx_rx direction)
194+{
195+ return counters->bpc[set][direction][IFS_TCP].packets
196+ + counters->bpc[set][direction][IFS_UDP].packets
197+ + counters->bpc[set][direction][IFS_PROTO_OTHER].packets;
198+}
199+
200+
201+/* Generic X based nodes used as a base for rb_tree ops */
202+struct tag_node {
203+ struct rb_node node;
204+ tag_t tag;
205+};
206+
207+struct tag_stat {
208+ struct tag_node tn;
209+ struct data_counters counters;
210+ /*
211+ * If this tag is acct_tag based, we need to count against the
212+ * matching parent uid_tag.
213+ */
214+ struct data_counters *parent_counters;
215+};
216+
217+struct iface_stat {
218+ struct list_head list; /* in iface_stat_list */
219+ char *ifname;
220+ bool active;
221+ /* net_dev is only valid for active iface_stat */
222+ struct net_device *net_dev;
223+
224+ struct byte_packet_counters totals_via_dev[IFS_MAX_DIRECTIONS];
225+ struct data_counters totals_via_skb;
226+ /*
227+ * We keep the last_known, because some devices reset their counters
228+ * just before NETDEV_UP, while some will reset just before
229+ * NETDEV_REGISTER (which is more normal).
230+ * So now, if the device didn't do a NETDEV_UNREGISTER and we see
231+ * its current dev stats smaller that what was previously known, we
232+ * assume an UNREGISTER and just use the last_known.
233+ */
234+ struct byte_packet_counters last_known[IFS_MAX_DIRECTIONS];
235+ /* last_known is usable when last_known_valid is true */
236+ bool last_known_valid;
237+
238+ struct proc_dir_entry *proc_ptr;
239+
240+ struct rb_root tag_stat_tree;
241+ spinlock_t tag_stat_list_lock;
242+};
243+
244+/* This is needed to create proc_dir_entries from atomic context. */
245+struct iface_stat_work {
246+ struct work_struct iface_work;
247+ struct iface_stat *iface_entry;
248+};
249+
250+/*
251+ * Track tag that this socket is transferring data for, and not necessarily
252+ * the uid that owns the socket.
253+ * This is the tag against which tag_stat.counters will be billed.
254+ * These structs need to be looked up by sock and pid.
255+ */
256+struct sock_tag {
257+ struct rb_node sock_node;
258+ struct sock *sk; /* Only used as a number, never dereferenced */
259+ /* Used to associate with a given pid */
260+ struct list_head list; /* in proc_qtu_data.sock_tag_list */
261+ pid_t pid;
262+
263+ tag_t tag;
264+};
265+
266+struct qtaguid_event_counts {
267+ /* Various successful events */
268+ atomic64_t sockets_tagged;
269+ atomic64_t sockets_untagged;
270+ atomic64_t counter_set_changes;
271+ atomic64_t delete_cmds;
272+ atomic64_t iface_events; /* Number of NETDEV_* events handled */
273+
274+ atomic64_t match_calls; /* Number of times iptables called mt */
275+ /* Number of times iptables called mt from pre or post routing hooks */
276+ atomic64_t match_calls_prepost;
277+ /*
278+ * match_found_sk_*: numbers related to the netfilter matching
279+ * function finding a sock for the sk_buff.
280+ * Total skbs processed is sum(match_found*).
281+ */
282+ atomic64_t match_found_sk; /* An sk was already in the sk_buff. */
283+ /* The connection tracker had or didn't have the sk. */
284+ atomic64_t match_found_sk_in_ct;
285+ atomic64_t match_found_no_sk_in_ct;
286+ /*
287+ * No sk could be found. No apparent owner. Could happen with
288+ * unsolicited traffic.
289+ */
290+ atomic64_t match_no_sk;
291+ /*
292+ * The file ptr in the sk_socket wasn't there and we couldn't get GID.
293+ * This might happen for traffic while the socket is being closed.
294+ */
295+ atomic64_t match_no_sk_gid;
296+};
297+
298+/* Track the set active_set for the given tag. */
299+struct tag_counter_set {
300+ struct tag_node tn;
301+ int active_set;
302+};
303+
304+/*----------------------------------------------*/
305+/*
306+ * The qtu uid data is used to track resources that are created directly or
307+ * indirectly by processes (uid tracked).
308+ * It is shared by the processes with the same uid.
309+ * Some of the resource will be counted to prevent further rogue allocations,
310+ * some will need freeing once the owner process (uid) exits.
311+ */
312+struct uid_tag_data {
313+ struct rb_node node;
314+ uid_t uid;
315+
316+ /*
317+ * For the uid, how many accounting tags have been set.
318+ */
319+ int num_active_tags;
320+ /* Track the number of proc_qtu_data that reference it */
321+ int num_pqd;
322+ struct rb_root tag_ref_tree;
323+ /* No tag_node_tree_lock; use uid_tag_data_tree_lock */
324+};
325+
326+struct tag_ref {
327+ struct tag_node tn;
328+
329+ /*
330+ * This tracks the number of active sockets that have a tag on them
331+ * which matches this tag_ref.tn.tag.
332+ * A tag ref can live on after the sockets are untagged.
333+ * A tag ref can only be removed during a tag delete command.
334+ */
335+ int num_sock_tags;
336+};
337+
338+struct proc_qtu_data {
339+ struct rb_node node;
340+ pid_t pid;
341+
342+ struct uid_tag_data *parent_tag_data;
343+
344+ /* Tracks the sock_tags that need freeing upon this proc's death */
345+ struct list_head sock_tag_list;
346+ /* No spinlock_t sock_tag_list_lock; use the global one. */
347+};
348+
349+/*----------------------------------------------*/
350+#endif /* ifndef __XT_QTAGUID_INTERNAL_H__ */
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_print.c
@@ -0,0 +1,565 @@
1+/*
2+ * Pretty printing Support for iptables xt_qtaguid module.
3+ *
4+ * (C) 2011 Google, Inc
5+ *
6+ * This program is free software; you can redistribute it and/or modify
7+ * it under the terms of the GNU General Public License version 2 as
8+ * published by the Free Software Foundation.
9+ */
10+
11+/*
12+ * Most of the functions in this file just waste time if DEBUG is not defined.
13+ * The matching xt_qtaguid_print.h will static inline empty funcs if the needed
14+ * debug flags ore not defined.
15+ * Those funcs that fail to allocate memory will panic as there is no need to
16+ * hobble allong just pretending to do the requested work.
17+ */
18+
19+#define DEBUG
20+
21+#include <linux/fs.h>
22+#include <linux/gfp.h>
23+#include <linux/net.h>
24+#include <linux/rbtree.h>
25+#include <linux/slab.h>
26+#include <linux/spinlock_types.h>
27+#include <net/sock.h>
28+
29+#include "xt_qtaguid_internal.h"
30+#include "xt_qtaguid_print.h"
31+
32+#ifdef DDEBUG
33+
34+static void _bug_on_err_or_null(void *ptr)
35+{
36+ if (IS_ERR_OR_NULL(ptr)) {
37+ pr_err("qtaguid: kmalloc failed\n");
38+ BUG();
39+ }
40+}
41+
42+char *pp_tag_t(tag_t *tag)
43+{
44+ char *res;
45+
46+ if (!tag)
47+ res = kasprintf(GFP_ATOMIC, "tag_t@null{}");
48+ else
49+ res = kasprintf(GFP_ATOMIC,
50+ "tag_t@%p{tag=0x%llx, uid=%u}",
51+ tag, *tag, get_uid_from_tag(*tag));
52+ _bug_on_err_or_null(res);
53+ return res;
54+}
55+
56+char *pp_data_counters(struct data_counters *dc, bool showValues)
57+{
58+ char *res;
59+
60+ if (!dc)
61+ res = kasprintf(GFP_ATOMIC, "data_counters@null{}");
62+ else if (showValues)
63+ res = kasprintf(
64+ GFP_ATOMIC, "data_counters@%p{"
65+ "set0{"
66+ "rx{"
67+ "tcp{b=%llu, p=%llu}, "
68+ "udp{b=%llu, p=%llu},"
69+ "other{b=%llu, p=%llu}}, "
70+ "tx{"
71+ "tcp{b=%llu, p=%llu}, "
72+ "udp{b=%llu, p=%llu},"
73+ "other{b=%llu, p=%llu}}}, "
74+ "set1{"
75+ "rx{"
76+ "tcp{b=%llu, p=%llu}, "
77+ "udp{b=%llu, p=%llu},"
78+ "other{b=%llu, p=%llu}}, "
79+ "tx{"
80+ "tcp{b=%llu, p=%llu}, "
81+ "udp{b=%llu, p=%llu},"
82+ "other{b=%llu, p=%llu}}}}",
83+ dc,
84+ dc->bpc[0][IFS_RX][IFS_TCP].bytes,
85+ dc->bpc[0][IFS_RX][IFS_TCP].packets,
86+ dc->bpc[0][IFS_RX][IFS_UDP].bytes,
87+ dc->bpc[0][IFS_RX][IFS_UDP].packets,
88+ dc->bpc[0][IFS_RX][IFS_PROTO_OTHER].bytes,
89+ dc->bpc[0][IFS_RX][IFS_PROTO_OTHER].packets,
90+ dc->bpc[0][IFS_TX][IFS_TCP].bytes,
91+ dc->bpc[0][IFS_TX][IFS_TCP].packets,
92+ dc->bpc[0][IFS_TX][IFS_UDP].bytes,
93+ dc->bpc[0][IFS_TX][IFS_UDP].packets,
94+ dc->bpc[0][IFS_TX][IFS_PROTO_OTHER].bytes,
95+ dc->bpc[0][IFS_TX][IFS_PROTO_OTHER].packets,
96+ dc->bpc[1][IFS_RX][IFS_TCP].bytes,
97+ dc->bpc[1][IFS_RX][IFS_TCP].packets,
98+ dc->bpc[1][IFS_RX][IFS_UDP].bytes,
99+ dc->bpc[1][IFS_RX][IFS_UDP].packets,
100+ dc->bpc[1][IFS_RX][IFS_PROTO_OTHER].bytes,
101+ dc->bpc[1][IFS_RX][IFS_PROTO_OTHER].packets,
102+ dc->bpc[1][IFS_TX][IFS_TCP].bytes,
103+ dc->bpc[1][IFS_TX][IFS_TCP].packets,
104+ dc->bpc[1][IFS_TX][IFS_UDP].bytes,
105+ dc->bpc[1][IFS_TX][IFS_UDP].packets,
106+ dc->bpc[1][IFS_TX][IFS_PROTO_OTHER].bytes,
107+ dc->bpc[1][IFS_TX][IFS_PROTO_OTHER].packets);
108+ else
109+ res = kasprintf(GFP_ATOMIC, "data_counters@%p{...}", dc);
110+ _bug_on_err_or_null(res);
111+ return res;
112+}
113+
114+char *pp_tag_node(struct tag_node *tn)
115+{
116+ char *tag_str;
117+ char *res;
118+
119+ if (!tn) {
120+ res = kasprintf(GFP_ATOMIC, "tag_node@null{}");
121+ _bug_on_err_or_null(res);
122+ return res;
123+ }
124+ tag_str = pp_tag_t(&tn->tag);
125+ res = kasprintf(GFP_ATOMIC,
126+ "tag_node@%p{tag=%s}",
127+ tn, tag_str);
128+ _bug_on_err_or_null(res);
129+ kfree(tag_str);
130+ return res;
131+}
132+
133+char *pp_tag_ref(struct tag_ref *tr)
134+{
135+ char *tn_str;
136+ char *res;
137+
138+ if (!tr) {
139+ res = kasprintf(GFP_ATOMIC, "tag_ref@null{}");
140+ _bug_on_err_or_null(res);
141+ return res;
142+ }
143+ tn_str = pp_tag_node(&tr->tn);
144+ res = kasprintf(GFP_ATOMIC,
145+ "tag_ref@%p{%s, num_sock_tags=%d}",
146+ tr, tn_str, tr->num_sock_tags);
147+ _bug_on_err_or_null(res);
148+ kfree(tn_str);
149+ return res;
150+}
151+
152+char *pp_tag_stat(struct tag_stat *ts)
153+{
154+ char *tn_str;
155+ char *counters_str;
156+ char *parent_counters_str;
157+ char *res;
158+
159+ if (!ts) {
160+ res = kasprintf(GFP_ATOMIC, "tag_stat@null{}");
161+ _bug_on_err_or_null(res);
162+ return res;
163+ }
164+ tn_str = pp_tag_node(&ts->tn);
165+ counters_str = pp_data_counters(&ts->counters, true);
166+ parent_counters_str = pp_data_counters(ts->parent_counters, false);
167+ res = kasprintf(GFP_ATOMIC,
168+ "tag_stat@%p{%s, counters=%s, parent_counters=%s}",
169+ ts, tn_str, counters_str, parent_counters_str);
170+ _bug_on_err_or_null(res);
171+ kfree(tn_str);
172+ kfree(counters_str);
173+ kfree(parent_counters_str);
174+ return res;
175+}
176+
177+char *pp_iface_stat(struct iface_stat *is)
178+{
179+ char *res;
180+ if (!is) {
181+ res = kasprintf(GFP_ATOMIC, "iface_stat@null{}");
182+ } else {
183+ struct data_counters *cnts = &is->totals_via_skb;
184+ res = kasprintf(GFP_ATOMIC, "iface_stat@%p{"
185+ "list=list_head{...}, "
186+ "ifname=%s, "
187+ "total_dev={rx={bytes=%llu, "
188+ "packets=%llu}, "
189+ "tx={bytes=%llu, "
190+ "packets=%llu}}, "
191+ "total_skb={rx={bytes=%llu, "
192+ "packets=%llu}, "
193+ "tx={bytes=%llu, "
194+ "packets=%llu}}, "
195+ "last_known_valid=%d, "
196+ "last_known={rx={bytes=%llu, "
197+ "packets=%llu}, "
198+ "tx={bytes=%llu, "
199+ "packets=%llu}}, "
200+ "active=%d, "
201+ "net_dev=%p, "
202+ "proc_ptr=%p, "
203+ "tag_stat_tree=rb_root{...}}",
204+ is,
205+ is->ifname,
206+ is->totals_via_dev[IFS_RX].bytes,
207+ is->totals_via_dev[IFS_RX].packets,
208+ is->totals_via_dev[IFS_TX].bytes,
209+ is->totals_via_dev[IFS_TX].packets,
210+ dc_sum_bytes(cnts, 0, IFS_RX),
211+ dc_sum_packets(cnts, 0, IFS_RX),
212+ dc_sum_bytes(cnts, 0, IFS_TX),
213+ dc_sum_packets(cnts, 0, IFS_TX),
214+ is->last_known_valid,
215+ is->last_known[IFS_RX].bytes,
216+ is->last_known[IFS_RX].packets,
217+ is->last_known[IFS_TX].bytes,
218+ is->last_known[IFS_TX].packets,
219+ is->active,
220+ is->net_dev,
221+ is->proc_ptr);
222+ }
223+ _bug_on_err_or_null(res);
224+ return res;
225+}
226+
227+char *pp_sock_tag(struct sock_tag *st)
228+{
229+ char *tag_str;
230+ char *res;
231+
232+ if (!st) {
233+ res = kasprintf(GFP_ATOMIC, "sock_tag@null{}");
234+ _bug_on_err_or_null(res);
235+ return res;
236+ }
237+ tag_str = pp_tag_t(&st->tag);
238+ res = kasprintf(GFP_ATOMIC, "sock_tag@%p{"
239+ "sock_node=rb_node{...}, "
240+ "sk=%p (f_count=%d), list=list_head{...}, "
241+ "pid=%u, tag=%s}",
242+ st, st->sk, refcount_read(&st->sk->sk_refcnt),
243+ st->pid, tag_str);
244+ _bug_on_err_or_null(res);
245+ kfree(tag_str);
246+ return res;
247+}
248+
249+char *pp_uid_tag_data(struct uid_tag_data *utd)
250+{
251+ char *res;
252+
253+ if (!utd)
254+ res = kasprintf(GFP_ATOMIC, "uid_tag_data@null{}");
255+ else
256+ res = kasprintf(GFP_ATOMIC, "uid_tag_data@%p{"
257+ "uid=%u, num_active_acct_tags=%d, "
258+ "num_pqd=%d, "
259+ "tag_node_tree=rb_root{...}, "
260+ "proc_qtu_data_tree=rb_root{...}}",
261+ utd, utd->uid,
262+ utd->num_active_tags, utd->num_pqd);
263+ _bug_on_err_or_null(res);
264+ return res;
265+}
266+
267+char *pp_proc_qtu_data(struct proc_qtu_data *pqd)
268+{
269+ char *parent_tag_data_str;
270+ char *res;
271+
272+ if (!pqd) {
273+ res = kasprintf(GFP_ATOMIC, "proc_qtu_data@null{}");
274+ _bug_on_err_or_null(res);
275+ return res;
276+ }
277+ parent_tag_data_str = pp_uid_tag_data(pqd->parent_tag_data);
278+ res = kasprintf(GFP_ATOMIC, "proc_qtu_data@%p{"
279+ "node=rb_node{...}, pid=%u, "
280+ "parent_tag_data=%s, "
281+ "sock_tag_list=list_head{...}}",
282+ pqd, pqd->pid, parent_tag_data_str
283+ );
284+ _bug_on_err_or_null(res);
285+ kfree(parent_tag_data_str);
286+ return res;
287+}
288+
289+/*------------------------------------------*/
290+void prdebug_sock_tag_tree(int indent_level,
291+ struct rb_root *sock_tag_tree)
292+{
293+ struct rb_node *node;
294+ struct sock_tag *sock_tag_entry;
295+ char *str;
296+
297+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
298+ return;
299+
300+ if (RB_EMPTY_ROOT(sock_tag_tree)) {
301+ str = "sock_tag_tree=rb_root{}";
302+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
303+ return;
304+ }
305+
306+ str = "sock_tag_tree=rb_root{";
307+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
308+ indent_level++;
309+ for (node = rb_first(sock_tag_tree);
310+ node;
311+ node = rb_next(node)) {
312+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
313+ str = pp_sock_tag(sock_tag_entry);
314+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
315+ kfree(str);
316+ }
317+ indent_level--;
318+ str = "}";
319+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
320+}
321+
322+void prdebug_sock_tag_list(int indent_level,
323+ struct list_head *sock_tag_list)
324+{
325+ struct sock_tag *sock_tag_entry;
326+ char *str;
327+
328+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
329+ return;
330+
331+ if (list_empty(sock_tag_list)) {
332+ str = "sock_tag_list=list_head{}";
333+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
334+ return;
335+ }
336+
337+ str = "sock_tag_list=list_head{";
338+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
339+ indent_level++;
340+ list_for_each_entry(sock_tag_entry, sock_tag_list, list) {
341+ str = pp_sock_tag(sock_tag_entry);
342+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
343+ kfree(str);
344+ }
345+ indent_level--;
346+ str = "}";
347+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
348+}
349+
350+void prdebug_proc_qtu_data_tree(int indent_level,
351+ struct rb_root *proc_qtu_data_tree)
352+{
353+ char *str;
354+ struct rb_node *node;
355+ struct proc_qtu_data *proc_qtu_data_entry;
356+
357+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
358+ return;
359+
360+ if (RB_EMPTY_ROOT(proc_qtu_data_tree)) {
361+ str = "proc_qtu_data_tree=rb_root{}";
362+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
363+ return;
364+ }
365+
366+ str = "proc_qtu_data_tree=rb_root{";
367+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
368+ indent_level++;
369+ for (node = rb_first(proc_qtu_data_tree);
370+ node;
371+ node = rb_next(node)) {
372+ proc_qtu_data_entry = rb_entry(node,
373+ struct proc_qtu_data,
374+ node);
375+ str = pp_proc_qtu_data(proc_qtu_data_entry);
376+ pr_debug("%*d: %s,\n", indent_level*2, indent_level,
377+ str);
378+ kfree(str);
379+ indent_level++;
380+ prdebug_sock_tag_list(indent_level,
381+ &proc_qtu_data_entry->sock_tag_list);
382+ indent_level--;
383+
384+ }
385+ indent_level--;
386+ str = "}";
387+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
388+}
389+
390+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree)
391+{
392+ char *str;
393+ struct rb_node *node;
394+ struct tag_ref *tag_ref_entry;
395+
396+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
397+ return;
398+
399+ if (RB_EMPTY_ROOT(tag_ref_tree)) {
400+ str = "tag_ref_tree{}";
401+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
402+ return;
403+ }
404+
405+ str = "tag_ref_tree{";
406+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
407+ indent_level++;
408+ for (node = rb_first(tag_ref_tree);
409+ node;
410+ node = rb_next(node)) {
411+ tag_ref_entry = rb_entry(node,
412+ struct tag_ref,
413+ tn.node);
414+ str = pp_tag_ref(tag_ref_entry);
415+ pr_debug("%*d: %s,\n", indent_level*2, indent_level,
416+ str);
417+ kfree(str);
418+ }
419+ indent_level--;
420+ str = "}";
421+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
422+}
423+
424+void prdebug_uid_tag_data_tree(int indent_level,
425+ struct rb_root *uid_tag_data_tree)
426+{
427+ char *str;
428+ struct rb_node *node;
429+ struct uid_tag_data *uid_tag_data_entry;
430+
431+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
432+ return;
433+
434+ if (RB_EMPTY_ROOT(uid_tag_data_tree)) {
435+ str = "uid_tag_data_tree=rb_root{}";
436+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
437+ return;
438+ }
439+
440+ str = "uid_tag_data_tree=rb_root{";
441+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
442+ indent_level++;
443+ for (node = rb_first(uid_tag_data_tree);
444+ node;
445+ node = rb_next(node)) {
446+ uid_tag_data_entry = rb_entry(node, struct uid_tag_data,
447+ node);
448+ str = pp_uid_tag_data(uid_tag_data_entry);
449+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
450+ kfree(str);
451+ if (!RB_EMPTY_ROOT(&uid_tag_data_entry->tag_ref_tree)) {
452+ indent_level++;
453+ prdebug_tag_ref_tree(indent_level,
454+ &uid_tag_data_entry->tag_ref_tree);
455+ indent_level--;
456+ }
457+ }
458+ indent_level--;
459+ str = "}";
460+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
461+}
462+
463+void prdebug_tag_stat_tree(int indent_level,
464+ struct rb_root *tag_stat_tree)
465+{
466+ char *str;
467+ struct rb_node *node;
468+ struct tag_stat *ts_entry;
469+
470+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
471+ return;
472+
473+ if (RB_EMPTY_ROOT(tag_stat_tree)) {
474+ str = "tag_stat_tree{}";
475+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
476+ return;
477+ }
478+
479+ str = "tag_stat_tree{";
480+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
481+ indent_level++;
482+ for (node = rb_first(tag_stat_tree);
483+ node;
484+ node = rb_next(node)) {
485+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
486+ str = pp_tag_stat(ts_entry);
487+ pr_debug("%*d: %s\n", indent_level*2, indent_level,
488+ str);
489+ kfree(str);
490+ }
491+ indent_level--;
492+ str = "}";
493+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
494+}
495+
496+void prdebug_iface_stat_list(int indent_level,
497+ struct list_head *iface_stat_list)
498+{
499+ char *str;
500+ struct iface_stat *iface_entry;
501+
502+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
503+ return;
504+
505+ if (list_empty(iface_stat_list)) {
506+ str = "iface_stat_list=list_head{}";
507+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
508+ return;
509+ }
510+
511+ str = "iface_stat_list=list_head{";
512+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
513+ indent_level++;
514+ list_for_each_entry(iface_entry, iface_stat_list, list) {
515+ str = pp_iface_stat(iface_entry);
516+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
517+ kfree(str);
518+
519+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
520+ if (!RB_EMPTY_ROOT(&iface_entry->tag_stat_tree)) {
521+ indent_level++;
522+ prdebug_tag_stat_tree(indent_level,
523+ &iface_entry->tag_stat_tree);
524+ indent_level--;
525+ }
526+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
527+ }
528+ indent_level--;
529+ str = "}";
530+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
531+}
532+
533+#endif /* ifdef DDEBUG */
534+/*------------------------------------------*/
535+static const char * const netdev_event_strings[] = {
536+ "netdev_unknown",
537+ "NETDEV_UP",
538+ "NETDEV_DOWN",
539+ "NETDEV_REBOOT",
540+ "NETDEV_CHANGE",
541+ "NETDEV_REGISTER",
542+ "NETDEV_UNREGISTER",
543+ "NETDEV_CHANGEMTU",
544+ "NETDEV_CHANGEADDR",
545+ "NETDEV_GOING_DOWN",
546+ "NETDEV_CHANGENAME",
547+ "NETDEV_FEAT_CHANGE",
548+ "NETDEV_BONDING_FAILOVER",
549+ "NETDEV_PRE_UP",
550+ "NETDEV_PRE_TYPE_CHANGE",
551+ "NETDEV_POST_TYPE_CHANGE",
552+ "NETDEV_POST_INIT",
553+ "NETDEV_UNREGISTER_BATCH",
554+ "NETDEV_RELEASE",
555+ "NETDEV_NOTIFY_PEERS",
556+ "NETDEV_JOIN",
557+};
558+
559+const char *netdev_evt_str(int netdev_event)
560+{
561+ if (netdev_event < 0
562+ || netdev_event >= ARRAY_SIZE(netdev_event_strings))
563+ return "bad event num";
564+ return netdev_event_strings[netdev_event];
565+}
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_print.h
@@ -0,0 +1,120 @@
1+/*
2+ * Pretty printing Support for iptables xt_qtaguid module.
3+ *
4+ * (C) 2011 Google, Inc
5+ *
6+ * This program is free software; you can redistribute it and/or modify
7+ * it under the terms of the GNU General Public License version 2 as
8+ * published by the Free Software Foundation.
9+ */
10+#ifndef __XT_QTAGUID_PRINT_H__
11+#define __XT_QTAGUID_PRINT_H__
12+
13+#include "xt_qtaguid_internal.h"
14+
15+#ifdef DDEBUG
16+
17+char *pp_tag_t(tag_t *tag);
18+char *pp_data_counters(struct data_counters *dc, bool showValues);
19+char *pp_tag_node(struct tag_node *tn);
20+char *pp_tag_ref(struct tag_ref *tr);
21+char *pp_tag_stat(struct tag_stat *ts);
22+char *pp_iface_stat(struct iface_stat *is);
23+char *pp_sock_tag(struct sock_tag *st);
24+char *pp_uid_tag_data(struct uid_tag_data *qtd);
25+char *pp_proc_qtu_data(struct proc_qtu_data *pqd);
26+
27+/*------------------------------------------*/
28+void prdebug_sock_tag_list(int indent_level,
29+ struct list_head *sock_tag_list);
30+void prdebug_sock_tag_tree(int indent_level,
31+ struct rb_root *sock_tag_tree);
32+void prdebug_proc_qtu_data_tree(int indent_level,
33+ struct rb_root *proc_qtu_data_tree);
34+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree);
35+void prdebug_uid_tag_data_tree(int indent_level,
36+ struct rb_root *uid_tag_data_tree);
37+void prdebug_tag_stat_tree(int indent_level,
38+ struct rb_root *tag_stat_tree);
39+void prdebug_iface_stat_list(int indent_level,
40+ struct list_head *iface_stat_list);
41+
42+#else
43+
44+/*------------------------------------------*/
45+static inline char *pp_tag_t(tag_t *tag)
46+{
47+ return NULL;
48+}
49+static inline char *pp_data_counters(struct data_counters *dc, bool showValues)
50+{
51+ return NULL;
52+}
53+static inline char *pp_tag_node(struct tag_node *tn)
54+{
55+ return NULL;
56+}
57+static inline char *pp_tag_ref(struct tag_ref *tr)
58+{
59+ return NULL;
60+}
61+static inline char *pp_tag_stat(struct tag_stat *ts)
62+{
63+ return NULL;
64+}
65+static inline char *pp_iface_stat(struct iface_stat *is)
66+{
67+ return NULL;
68+}
69+static inline char *pp_sock_tag(struct sock_tag *st)
70+{
71+ return NULL;
72+}
73+static inline char *pp_uid_tag_data(struct uid_tag_data *qtd)
74+{
75+ return NULL;
76+}
77+static inline char *pp_proc_qtu_data(struct proc_qtu_data *pqd)
78+{
79+ return NULL;
80+}
81+
82+/*------------------------------------------*/
83+static inline
84+void prdebug_sock_tag_list(int indent_level,
85+ struct list_head *sock_tag_list)
86+{
87+}
88+static inline
89+void prdebug_sock_tag_tree(int indent_level,
90+ struct rb_root *sock_tag_tree)
91+{
92+}
93+static inline
94+void prdebug_proc_qtu_data_tree(int indent_level,
95+ struct rb_root *proc_qtu_data_tree)
96+{
97+}
98+static inline
99+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree)
100+{
101+}
102+static inline
103+void prdebug_uid_tag_data_tree(int indent_level,
104+ struct rb_root *uid_tag_data_tree)
105+{
106+}
107+static inline
108+void prdebug_tag_stat_tree(int indent_level,
109+ struct rb_root *tag_stat_tree)
110+{
111+}
112+static inline
113+void prdebug_iface_stat_list(int indent_level,
114+ struct list_head *iface_stat_list)
115+{
116+}
117+#endif
118+/*------------------------------------------*/
119+const char *netdev_evt_str(int netdev_event);
120+#endif /* ifndef __XT_QTAGUID_PRINT_H__ */
Show on old repository browser