Better way to get tag stats?

I have over 4000 posts. I am trying to query all the posts and get the count of tags each post has and sum up posts count based on number of tags the post has in dashboard. The posts count shows up properly when post_per_page is less than 2000 but beyond 2000 , the query timesout . It just shows ‘0’ for all.

Code

  $args = array(
    'posts_per_page' => 4000,
    'post_status'  => 'publish',
  );

  $zerotags = 0;
  $onetag = 0;
  $twotags = 0;
  $morethantwo = 0;
  $sixtags_plus = 0;

  $query = new WP_Query( $args );
  while ( $query->have_posts() ) : $query->the_post();

     $posttags = get_the_tags();
     $tag_count = 0;

     foreach($posttags as $tag) {
       $tag_count++;
     }
     if ($tag_count == 0) {
       $zerotags++;
     }
     if ($tag_count == 1) {
       $onetag++;
     }
     if ($tag_count == 2) {
       $twotags++;
     }
     if ($tag_count > 2 && $tag_count < 6) {
       $morethantwo++;
     }
     if ($tag_count >= 6) {
       $sixtags_plus++;
     }

 endwhile;

 echo 'Zero Tags : '.$zerotags.'posts';
 echo 'One Tag : '.$onetag.'posts';
 echo 'Two Tags : '.$twotags.'posts';
 echo 'More than 2 and less than 6 : '.$morethantwo.'posts';
 echo 'More than 6 tags : '.$sixtags_plus.'posts';

Is there a better approach to query this so that the timeout doesn’t occur?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I addressed a similar problem not long ago – it’s all in the memory:

$post_ids = get_posts(
    array(
        'posts_per_page' => -1,
        'post_status'  => 'publish',
        'fields' => 'ids', // Just grab IDs instead of pulling 1000's of objects into memory
    )
);

update_object_term_cache( $post_ids, 'post' ); // Cache all the post terms in one query, memory should be ok

foreach ( $post_ids as $post_id ) {
    if ( ! $tags = get_object_term_cache( $post_id, 'post_tag' ) ) {
       $zerotags++;
    } else {
        $tag_count = count( $tags );

        if ( $tag_count === 1 ) {
            $onetag++;
        } elseif ( $tag_count === 2 ) {
            $twotags++;
        } elseif ( $tag_count >= 6 ) {
            $sixtags_plus++;
        }

        if ( $tag_count > 2 && $tag_count < 6 ) {
            $morethantwo++;
        }
    }
}

Update: Switched get_the_tags to get_object_term_cache – otherwise we lose all our hard work! (the former hits get_post, which will hit the db on every iteration and chuck the post object into memory – props @Pieter Goosen).

Update 2: The second argument for update_object_term_cache should be the post type, not the taxonomy.

Method 2

You are hiting the db with a 500 mile per hour hurricane, no wonder your query times out.

Here is an idea or two to speed things up

  • Add 'fields' => 'ids', to your WP_Query arguments. This will speed up your query dramatically. This will only return the post id’s, and this is the only thing that you actually need
  • Use wp_get_post_terms() to get the post tags. The third parameter takes an array of arguments, one beign fields which you can also set to just return ids which will also speed up your query as it will also just return tag ID’s and not the complete tag object
  • Use transients to save your results and flush them when a new post is published, or when a post is deleted, undeleted or updated. Use transition_post_status

EDIT- IDEA TO CODE TRANSITION

Setup the function to delete the tansient if a new post is published, or when a post is deleted or undeleted or updated

In your functions.php

add_action( 'transition_post_status', function ()
{
        global $wpdb;
        $wpdb->query( "DELETE FROM $wpdb->options WHERE `option_name` LIKE ('_transient%_tag_list_%')" );
        $wpdb->query( "DELETE FROM $wpdb->options WHERE `option_name` LIKE ('_transient_timeout%_tag_list_%')" );
});

Get the tag count and add it to a transient

function get_term_post_count( $taxonomy = 'post_tag', $post_type = 'post' )
{
    if ( false === ( $total_counts = get_transient( 'tag_list_' . md5( $taxonomy . $post_type ) ) ) ) {

        if ( !taxonomy_exists( $taxonomy ) )
            return $total_counts = null;

        $args = [
            'nopaging' => true, //Gets all posts
            'fields' => 'ids'
        ];
        $q = new WP_Query( $args );

        if ( empty( $q->posts ) )
              return $total_counts = null;

        update_object_term_cache( $q->posts, $post_type );

        foreach ( $q->posts as $single_post ) {

            $tags = get_object_term_cache( $single_post, $taxonomy );

            if ( empty( $tags ) ) {
                $no_tags[] = $single_post;
            } else {
                $count = count( $tags );
                if ( $count == 1 ) {
                    $one[] = $single_post;
                 } elseif ( $count == 2 ) {
                     $two[] = $single_post;
                 } elseif ( $count >= 3 && $count <= 6 ) {
                     $more_than_two[] = $single_post;
                 } elseif ( $count > 6 ) {
                      $more_than_six[] = $single_post;
                 }
            }
        }

       $total_counts = [
            'none' => isset( $no_tags ) ? ( (int) count( $no_tags ) ) : 0,
            'one' => isset( $one ) ? ( (int) count( $one ) ) : 0,
            'two' => isset( $two ) ? ( (int) count( $two ) ) : 0,
            'more_than_two' => isset( $more_than_two ) ? ( (int) count( $more_than_two ) ) : 0,
            'more_than_six' => isset( $more_than_six ) ? ( (int) count( $more_than_six) ) : 0
        ];


    set_transient( 'tag_list_' . md5( $taxonomy . $post_type ), $total_counts, 24 * HOUR_IN_SECONDS );

    return $total_counts;
}

You can use the function as follows in your template

$q = get_term_post_count();
if ( $q !== null ) {
     echo 'Zero Tags : '.$q['none'].'posts </br>'; 
     echo 'One Tag : '.$q['one'].'posts </br>';
     echo 'Two Tags : '.$q['two'].'posts </br>';
     echo 'More than 2 and less than 6 : '.$q['more_than_two'].'posts </br>';
     echo 'More than 6 tags : '.$q['more_than_six'].'posts </br>';
}

FEW IMPORTANT NOTES

  • The code above is untested and might be buggy
  • Requires PHP 5.4 +
  • The first parameter is $taxonomy. You can pass any taxonomy to the code, the deafault is post_tag. The second parameter is $post_type which is set to default post. You can pass any post type to the parameter
  • Modify and abuse as you see fit

EDIT 1

Fixed a couple of minor bugs, the code is now tested and is working

EDIT 2 – PERFORMANCE TESTING

—SCRAPPED—

EDIT 3 thanks to @TheDeadMedic

I have also learned a bit from @TheDeadMedic about update_object_term_cache and get_object_term_cache which increases the performance a lot. I have updated ( stole a bit from @TheDeadMedic, upvoted his answer in return 🙂) my answer with this info. It does hit the memory a bit, but this problem is partly overcome with the use of transients.

The code now gives me

2 queries in 0.09766 seconds.

without using transients

Method 3

Frequency table – custom SQL query:

You can try the following custom query for your posts/terms statistics for a given taxonomy, post status and type:

/**
 * Frequency data: Count how many posts have a given number of terms, 
 * for a given post type, post status and taxonomy.
 *
 * @param string   $taxonomy    Taxonomy slug
 * @param string   $post_status Post status (draft, publish, ...)
 * @param string   $post_type   Post type (post, page, ...)
 * @return array   Array containing freq. data with 'total' (posts) and 'term' counts
 */

function get_post_terms_stats_wpse_184993( 
    $taxonomy = 'post_tag', 
    $post_status = 'publish', 
    $post_type = 'post' 
){
    global $wpdb;
    $sql = " 
        SELECT COUNT( s.terms_per_post ) as total, s.terms_per_post 
        FROM ( 
            SELECT COUNT( tr.object_id ) terms_per_post, tr.object_id 
            FROM {$wpdb->term_relationships} tr 
            LEFT JOIN {$wpdb->term_taxonomy} tt USING( term_taxonomy_id ) 
            LEFT JOIN {$wpdb->posts} p ON p.ID = tr.object_id  
            WHERE     tt.taxonomy = '%s' 
                  AND p.post_status = '%s' 
                  AND p.post_type = '%s'
            GROUP BY tr.object_id 
         ) as s 
         GROUP by s.terms_per_post
         ORDER BY total DESC";

    return $wpdb->get_results( 
        $wpdb->prepare( $sql, $taxonomy, $post_status, $post_type ), 
        ARRAY_A 
    );
}

Example on an install with ~10k posts:

Here’s an example for category in published posts:

$stats = get_post_terms_stats_wpse_184993( 
    $taxonomy    = 'category', 
    $post_status = 'publish', 
    $post_type   = 'post' 
);  

print_r( $stats );

with the following output:

Array
(
    [0] => Array
        (
            [total] => 8173
            [terms_per_post] => 1
        )

    [1] => Array
        (
            [total] => 948
            [terms_per_post] => 2
        )

    [2] => Array
        (
            [total] => 94
            [terms_per_post] => 3
        )

    [3] => Array
        (
            [total] => 2
            [terms_per_post] => 4
        )

    [4] => Array
        (
            [total] => 1
            [terms_per_post] => 6
        )

    [5] => Array
        (
            [total] => 1
            [terms_per_post] => 8
        )

)

We can output it in a HTML table:

foreach( $stats as $row )
{
    $rows .= sprintf( 
        "<tr><td>%d</td><td>%d</td></tr>", 
        $row['total'], 
        $row['terms_per_post'] 
    );
}
printf( "<table><tr><th>#Posts</th><th>#Terms</th></tr>%s</table>", $rows );

with the following output:

stats

So here we can see how many posts have a given number of terms, in order.

Currently the SQL query uses temporary and filesort, so there are definitely opportunities to adjust it. On a 10k posts install this took under 0.2s to run on my small VPS. We could for example remove the post table join to make it faster, but then it would be less flexible.

We can also cache the output, for example with the transients API, as mentioned by @Pieter Goosen in his answer.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x